vectorpipe

VectorPipe is a library for mass conversion of Vector data into Mapbox VectorTiles. It is powered by GeoTrellis and Apache Spark.

Outline

GeoTrellis and Spark do most of our work for us. Writing a main function that uses VectorPipe need not contain much more than:

import geotrellis.proj4.WebMercator
import geotrellis.spark.tiling.{LayoutDefinition, ZoomedLayoutScheme}
import geotrellis.vectortile.VectorTile
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession
import vectorpipe._

implicit val ss: SparkSession = ...

val layout: LayoutDefinition =
  ZoomedLayoutScheme.layoutForZoom(15, WebMercator.worldExtent, 512)

/* An ORC file containing OSM data. */
val path: String = "s3://path/to/data.orc"

osm.fromORC(path) match {
  case Failure(_) => { /* Handle the error. Was your path correct? */ }
  case Success((nodes, ways, relations)) => {

    val features: RDD[OSMFeature] =
      osm.features(nodes, ways, relations).geometries

    val featGrid: RDD[(SpatialKey, Iterable[OSMFeature])] =
      grid(Clip.byHybrid, layout, features)

    val tiles: RDD[(SpatialKey, VectorTile)] =
      vectortiles(Collate.byAnalytics, layout, featGrid)

    // further processing / output
}

/* Nicely stop Spark */
ss.stop()

Writing Portable Tiles

This method outputs VectorTiles to a directory structure appropriate for serving by a Tile Map Server. The VTs themselves are saved in the usual .mvt format, and so can be read by any other tool. The example that follows writes tiles from above to an S3 bucket:

import geotrellis.spark.io.s3._  // requires the `geotrellis-s3` library

/* How should a `SpatialKey` map to a filepath on S3? */
val s3PathFromKey: SpatialKey => String = SaveToS3.spatialKeyToPath(
  LayerId("sample", 1),  // Whatever zoom level it is
  "s3://some-bucket/catalog/{name}/{z}/{x}/{y}.mvt"
)

tiles.saveToS3(s3PathFromKey)

Writing a GeoTrellis Layer of VectorTiles

The disadvantage of the "Portable Tiles" approach is that there is no way to read the tiles back into a RDD[(SpatialKey, VectorTile)] and do Spark-based manipulation operations. To do that, the tiles have to be written as a "GeoTrellis Layer" from the get-go. The output of such a write are split and compressed files that aren't readable by other tools. This method compresses VectorTiles to about half the size of a normal .mvt.

import geotrellis.spark._
import geotrellis.spark.io._
import geotrellis.spark.io.file._    /* When writing to your local computer */
import org.apache.spark.storage.StorageLevel

/* IO classes */
val catalog: String = "/home/you/tiles/"  /* This must exist ahead of time! */
val store = FileAttributeStore(catalog)
val writer = FileLayerWriter(store)

/* Almost certainly necessary, to save Spark from repeating effort */
val persisted = tiles.persist(StorageLevel.MEMORY_AND_DISK_SER)

/* Dynamically determine the KeyBounds */
val bounds: KeyBounds[SpatialKey] =
  persisted.map({ case (key, _) => KeyBounds(key, key) }).reduce(_ combine _)

/* Construct metadata for the Layer */
val meta = LayerMetadata(layout, bounds)

/* Write the Tile Layer */
writer.write(LayerId("north-van", 15), ContextRDD(persisted, meta), ZCurveKeyIndexMethod)

Linear Supertypes

AnyRef, Any

Type Members

case class LayerMetadata[K](layout: LayoutDefinition, bounds: KeyBounds[K])(implicit evidence$1: JsonFormat[K]) extends Product with Serializable

Minimalist Layer-level metadata.
Minimalist Layer-level metadata. Necessary for writing layers of VectorTiles.

Value Members

object Clip

Clipping Strategies.
object Collate

"Collator" or "Schema" functions which form VectorTiles from collections of GeoTrellis Features.
"Collator" or "Schema" functions which form VectorTiles from collections of GeoTrellis Features. Any function can be considered a valid "collator" if it satisfies the type:
```
collate: (Extent, Iterable[Feature[G,D]]) => VectorTile
```
Usage
Create a VectorTile from some collection of GeoTrellis Geometries:
```
val tileExtent: Extent = ... // Extent of _this_ Tile
val geoms: Iterable[Feature[Geometry, Map[String, String]]] = ...  // Some collection of Geometries

val tile: VectorTile = Collate.withStringMetadata(tileExtent, geoms)
```
Create a VectorTile via some custom collation scheme:
```
def partition(f: Feature[G,D]): String = ...
def metadata(d: D): Map[String, Value] = ...

val tileExtent: Extent = ... // Extent of _this_ Tile
val geoms: Iterable[Feature[G, D]] = ...  // Some collection of Geometries

val tile: VectorTile = Collate.generically(tileExtent, geoms, partition, metadata)
```
Writing your own Collator Function
We provide a few defaults here, but any collation scheme is possible. Collation just refers to the process of organizing some Iterable collection of Geometries into various VectorTile Layers. Creating your own collator is done easiest with generically. It expects a partition function to guide Geometries into separate Layers, and a metadata transformation function.
Partition Functions
A valid partition function must be of the type:
```
partition: Feature[G,D] => String
```
The output String is the name of the Layer you'd like a given Feature to be relegated to. Notice that the entire Feature is available (i.e. both its Geometry and metadata), so that your partitioner can make fine-grained choices.
Metadata Transformation Functions
One of these takes your D type and transforms it into what VectorTiles expect:
```
metadata: D => Map[String, Value]
```
You're encouraged to review the Value sum-type in geotrellis.vectortile
On Winding Order
VectorTiles require that Polygon exteriors have clockwise winding order, and that interior holes have counter-clockwise winding order. These assume that the origin (0,0) is in the top-left corner.
Any custom collator which does not call generically must correct for Polygon winding order manually. This can be done via the vectorpipe.winding function.
But why correct for winding order at all? Well, OSM data makes no guarantee about what winding order its derived Polygons will have. We could correct winding order when our first RDD[OSMFeature] is created, except that its unlikely that the clipping process afterward would maintain our winding for all Polygons.
object LayerMetadata extends Serializable
def grid[D](clip: (Extent, Feature[Geometry, D], Predicates) ⇒ Option[Feature[Geometry, D]], logError: (((Extent, Feature[Geometry, D])) ⇒ String) ⇒ ((Extent, Feature[Geometry, D])) ⇒ Unit, ld: LayoutDefinition, rdd: RDD[Feature[Geometry, D]]): RDD[(SpatialKey, Iterable[Feature[Geometry, D]])]

Given a particular Layout (tile grid), split a collection of Features into a grid of them indexed by SpatialKey.
Given a particular Layout (tile grid), split a collection of Features into a grid of them indexed by SpatialKey.
Clipping Strategies
A clipping strategy defines how Geometries which stretch outside their associated bounding box should be reduced to better fit it. This is benefical, as it saves on storage for large, complex Geometries who only partially intersect some bounding box. The excess points will be cut out, but the "how" is a matter of weighing PROs and CONs in the context of the user's use-case. Several strategies come to mind:
- Clip directly on the bounding box
- Clip just outside the bounding box
- Keep the nearest Point outside the bounding box, wherever it is
- Custom clipping for each OSM Element type (building, etc)
- Don't clip
These clipping strategies are defined in vectorpipe.geom.Clip, where you can find further explanation.
clip
A function which represents a "clipping strategy".
logError
An IO function that will log any clipping failures.
ld
The LayoutDefinition defining the area to gridify.
def logNothing[A](f: (A) ⇒ String): (A) ⇒ Unit

Skip over some failure.
def logToLog4j[A](f: (A) ⇒ String): (A) ⇒ Unit

Log an error as an ERROR through Spark's default log4j.
def logToStdout[A](f: (A) ⇒ String): (A) ⇒ Unit

Log an error to STDOUT.
package osm

Types and functions unique to working with OpenStreetMap data.
implicit val vectorTileCodec: AvroRecordCodec[VectorTile]

Encode a VectorTile via Avro.
Encode a VectorTile via Avro. This is the glue for Layer IO.
def vectortiles[G <: Geometry, D](collate: (Extent, Iterable[Feature[G, D]]) ⇒ VectorTile, ld: LayoutDefinition, rdd: RDD[(SpatialKey, Iterable[Feature[G, D]])]): RDD[(SpatialKey, VectorTile)]

Given a collection of GeoTrellis Features which have been associated with some SpatialKey and a "collation" function, form those Features into a VectorTile.
Given a collection of GeoTrellis Features which have been associated with some SpatialKey and a "collation" function, form those Features into a VectorTile.

See also
vectorpipe.Collate
def winding(p: Polygon): Polygon

Ensure a geotrellis.vector.Polygon has the correct winding order to be used in a VectorTile.

Inherited from AnyRef

Inherited from Any

Actions

Functions to transform RDDs of Features along the pipeline.

Error Logging

Useful defaults for functions like vectorpipe.grid, where we wish to log small failures and skip them, instead of crashing the entire Spark job.

package vectorpipe

Outline

Writing Portable Tiles

Writing a GeoTrellis Layer of VectorTiles

Type Members

case class LayerMetadata[K](layout: LayoutDefinition, bounds: KeyBounds[K])(implicit evidence$1: JsonFormat[K]) extends Product with Serializable

Value Members

object Clip

object Collate

Usage

Writing your own Collator Function

Partition Functions

Metadata Transformation Functions

On Winding Order

object LayerMetadata extends Serializable

Clipping Strategies

def logNothing[A](f: (A) ⇒ String): (A) ⇒ Unit

def logToLog4j[A](f: (A) ⇒ String): (A) ⇒ Unit

def logToStdout[A](f: (A) ⇒ String): (A) ⇒ Unit

package osm

implicit val vectorTileCodec: AvroRecordCodec[VectorTile]

def vectortiles[G <: Geometry, D](collate: (Extent, Iterable[Feature[G, D]]) ⇒ VectorTile, ld: LayoutDefinition, rdd: RDD[(SpatialKey, Iterable[Feature[G, D]])]): RDD[(SpatialKey, VectorTile)]

def winding(p: Polygon): Polygon

Inherited from AnyRef

Inherited from Any

Actions

Error Logging

Utility Functions

Typeclass Instances

Ungrouped