Concepts
VectorPipe strives to be straight-forward. With only a few simple function applications we can transform completely raw data into a grid of VectorTiles, ready for further processing. “Clipping” and “Collation” functions help us customize this process along the way.
Data Sources
Some source of Vector (re: geometric) data on the earth. Could come in any format (example: OpenStreetMap).
For each data source that has first-class support, we expose a
vectorpipe.*
module with a matching name. Example: vectorpipe.osm
. These
modules expose all the types and functions necessary for transforming the
raw data into the “Middle Ground” types.
No first-class support for your favourite data source? Want to write it
yourself, and maybe even keep it private? That’s okay, just provide the
function YourData => RDD[Feature[G, D]]
and VectorPipe can handle the
rest.
The “Middle Ground”
A collection of Geometries on the earth. The actual data can be distributed
across multiple machines via Spark’s RDD
type. From this “middle ground”,
we can proceed with creating Vector Tiles, or (with the right supporting
code) we could convert back into the format of the original source data.
Note that via the method VectorTile.toIterable
, the following conversion
is possible:
import geotrellis.spark._
import geotrellis.vector._
import geotrellis.vectortile._
import org.apache.spark._
import org.apache.spark.rdd.RDD
implicit val sc: SparkContext = new SparkContext(
new SparkConf().setMaster("local[*]").setAppName("back-to-middle-ground")
)
/* Mocked as `empty` for the example */
val tiles: RDD[(SpatialKey, VectorTile)] = sc.emptyRDD
/* A VT layer converted back to the "middle ground", possibly for recollation */
val backToMiddle: RDD[(SpatialKey, Iterable[Feature[Geometry, Map[String, Value]]])] =
tiles.mapValues(_.toIterable)
/* Close up Spark nicely */
sc.stop()
Clipping Functions
GeoTrellis has a consistent RDD[(K, V)]
pattern for handling grids of
tiled data, where K
is the grid index and V
is the actual value type.
Before RDD[(SpatialKey, VectorTile)]
can be achieved, we need to convert
our gridless RDD[Feature[G, D]]
into such a grid, such that each Feature’s
Geometry
is reasonably clipped to the size of an individual tile. Depending
on which clipping function you choose (from the vectorpipe.Clip
object, or
even your own custom one) the shape of the clipped Geometry will vary. See
our Scaladocs for more detail on the available options.
Admittedly, we sometimes can’t guarantee the validity of incoming vector data. Clipping is known to occasionally fail on large, complex multipolygons, so we skip over these failures while optionally allowing to log them. Any logging framework can be used.
Collation Functions
Once clipped and gridded by VectorPipe.toGrid
, we have a RDD[(SpatialKey,
Iterable[Feature[G, D]])]
that represents all the Geometry fragments
present at each tiled location on the earth. This is the perfect shape to
turn into a VectorTile
. To do so, we need to choose a Collator function,
which determines what VectorTile Layer each Feature
should be placed into,
and how (if at all) its corresponding metadata (the D
) should be
processed.
Want to write your own Collator? The Collate.generically
function will be
of interest to you.
Output Targets
We can imagine two possible outputs for our completed grid of Vector Tiles:
- A compressed GeoTrellis layer, saved to S3 or elsewhere
- A dump of every tile as an
.mvt
, readable by other software
Either option is simple, but outputting an RDD[(SpatialKey, VectorTile)]
isn’t actually the concern of VectorPipe - it can be handled entirely in
client code via GeoTrellis functionality. An example of this can be found
in this repository.