package rdd
Type Members
-
sealed
class
FilteredCartesianRDD[T, U, V] extends RDD[(T, U)] with Serializable
Performs a cartesian join of two RDDs using filter and refine pattern.
Performs a cartesian join of two RDDs using filter and refine pattern.
During RDD declaration n*m partitions will be generated, one for each possible cartesian mapping. During RDD execution summary functions will be applied in a map-side reduce to
rrd1
andrdd2
. These results will be collected and filtered usingmetapred
for partitions with potential matches. Partition pairings with possible matches will be checked usingpred
in a refinement step.No shuffle from
rdd1
orrdd2
will be performed by the filter step, but the records of metardds, produced using the summary functions, will be shuffled (as they must be). The metardds contain one item per partition (ex: a "bounding box" of records in parent rdd), so it is assumed that this shuffle will be low cost.For efficient execution it is assumed that potential matches exist for limited number of cartesian pairings, if no filtering is possible worst case scenario is full cartesian product.
- type IsWritable[A] = (A) ⇒ Writable