Class

org.apache.spark.rdd

FilteredCartesianRDD

Related Doc: package rdd

Permalink

sealed class FilteredCartesianRDD[T, U, V] extends RDD[(T, U)] with Serializable

Performs a cartesian join of two RDDs using filter and refine pattern.

During RDD declaration n*m partitions will be generated, one for each possible cartesian mapping. During RDD execution summary functions will be applied in a map-side reduce to rrd1 and rdd2. These results will be collected and filtered using metapred for partitions with potential matches. Partition pairings with possible matches will be checked using pred in a refinement step.

No shuffle from rdd1 or rdd2 will be performed by the filter step, but the records of metardds, produced using the summary functions, will be shuffled (as they must be). The metardds contain one item per partition (ex: a "bounding box" of records in parent rdd), so it is assumed that this shuffle will be low cost.

For efficient execution it is assumed that potential matches exist for limited number of cartesian pairings, if no filtering is possible worst case scenario is full cartesian product.

Linear Supertypes
RDD[(T, U)], Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. FilteredCartesianRDD
  2. RDD
  3. Logging
  4. Serializable
  5. Serializable
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new FilteredCartesianRDD(sc: SparkContext, pred: (T, U) ⇒ Boolean, metapred: (V, V) ⇒ Boolean, rdd1: RDD[T], summaryFn1: (Iterator[T]) ⇒ Iterator[V], rdd2: RDD[U], summaryFn2: (Iterator[U]) ⇒ Iterator[V])(implicit arg0: ClassTag[T], arg1: ClassTag[U], arg2: ClassTag[V])

    Permalink

    sc

    SparkContext

    pred

    refinement predicate

    metapred

    filter predicate

    rdd1

    RDD of elements on the left side of the cartisian join

    rdd2

    RDD of elements on the right side of the cartisian join

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. def ++(other: RDD[(T, U)]): RDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. def aggregate[U](zeroValue: U)(seqOp: (U, (T, U)) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): U

    Permalink
    Definition Classes
    RDD
  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. def cache(): FilteredCartesianRDD.this.type

    Permalink
    Definition Classes
    RDD
  8. def cartesian[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[((T, U), U)]

    Permalink
    Definition Classes
    RDD
  9. def checkpoint(): Unit

    Permalink
    Definition Classes
    RDD
  10. def clearDependencies(): Unit

    Permalink
    Definition Classes
    FilteredCartesianRDD → RDD
  11. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  12. def coalesce(numPartitions: Int, shuffle: Boolean, partitionCoalescer: Option[PartitionCoalescer])(implicit ord: Ordering[(T, U)]): RDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  13. def collect[U](f: PartialFunction[(T, U), U])(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Definition Classes
    RDD
  14. def collect(): Array[(T, U)]

    Permalink
    Definition Classes
    RDD
  15. def compute(split: Partition, context: TaskContext): Iterator[(T, U)]

    Permalink
    Definition Classes
    FilteredCartesianRDD → RDD
  16. def context: SparkContext

    Permalink
    Definition Classes
    RDD
  17. def count(): Long

    Permalink
    Definition Classes
    RDD
  18. def countApprox(timeout: Long, confidence: Double): PartialResult[BoundedDouble]

    Permalink
    Definition Classes
    RDD
  19. def countApproxDistinct(relativeSD: Double): Long

    Permalink
    Definition Classes
    RDD
  20. def countApproxDistinct(p: Int, sp: Int): Long

    Permalink
    Definition Classes
    RDD
  21. def countByValue()(implicit ord: Ordering[(T, U)]): Map[(T, U), Long]

    Permalink
    Definition Classes
    RDD
  22. def countByValueApprox(timeout: Long, confidence: Double)(implicit ord: Ordering[(T, U)]): PartialResult[Map[(T, U), BoundedDouble]]

    Permalink
    Definition Classes
    RDD
  23. final def dependencies: Seq[Dependency[_]]

    Permalink
    Definition Classes
    RDD
  24. def distinct(): RDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  25. def distinct(numPartitions: Int)(implicit ord: Ordering[(T, U)]): RDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  26. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  27. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  28. def filter(f: ((T, U)) ⇒ Boolean): RDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  29. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  30. def first(): (T, U)

    Permalink
    Definition Classes
    RDD
  31. def firstParent[U](implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Attributes
    protected[org.apache.spark]
    Definition Classes
    RDD
  32. def flatMap[U](f: ((T, U)) ⇒ TraversableOnce[U])(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Definition Classes
    RDD
  33. def fold(zeroValue: (T, U))(op: ((T, U), (T, U)) ⇒ (T, U)): (T, U)

    Permalink
    Definition Classes
    RDD
  34. def foreach(f: ((T, U)) ⇒ Unit): Unit

    Permalink
    Definition Classes
    RDD
  35. def foreachPartition(f: (Iterator[(T, U)]) ⇒ Unit): Unit

    Permalink
    Definition Classes
    RDD
  36. def getCheckpointFile: Option[String]

    Permalink
    Definition Classes
    RDD
  37. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  38. def getDependencies: Seq[Dependency[_]]

    Permalink
    Definition Classes
    FilteredCartesianRDD → RDD
  39. final def getNumPartitions: Int

    Permalink
    Definition Classes
    RDD
    Annotations
    @Since( "1.6.0" )
  40. def getPartitions: Array[Partition]

    Permalink
    Definition Classes
    FilteredCartesianRDD → RDD
  41. def getPreferredLocations(split: Partition): Seq[String]

    Permalink
    Definition Classes
    FilteredCartesianRDD → RDD
  42. def getStorageLevel: StorageLevel

    Permalink
    Definition Classes
    RDD
  43. def glom(): RDD[Array[(T, U)]]

    Permalink
    Definition Classes
    RDD
  44. def groupBy[K](f: ((T, U)) ⇒ K, p: Partitioner)(implicit kt: ClassTag[K], ord: Ordering[K]): RDD[(K, Iterable[(T, U)])]

    Permalink
    Definition Classes
    RDD
  45. def groupBy[K](f: ((T, U)) ⇒ K, numPartitions: Int)(implicit kt: ClassTag[K]): RDD[(K, Iterable[(T, U)])]

    Permalink
    Definition Classes
    RDD
  46. def groupBy[K](f: ((T, U)) ⇒ K)(implicit kt: ClassTag[K]): RDD[(K, Iterable[(T, U)])]

    Permalink
    Definition Classes
    RDD
  47. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  48. val id: Int

    Permalink
    Definition Classes
    RDD
  49. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  50. def intersection(other: RDD[(T, U)], numPartitions: Int): RDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  51. def intersection(other: RDD[(T, U)], partitioner: Partitioner)(implicit ord: Ordering[(T, U)]): RDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  52. def intersection(other: RDD[(T, U)]): RDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  53. def isCheckpointed: Boolean

    Permalink
    Definition Classes
    RDD
  54. def isEmpty(): Boolean

    Permalink
    Definition Classes
    RDD
  55. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  56. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  57. final def iterator(split: Partition, context: TaskContext): Iterator[(T, U)]

    Permalink
    Definition Classes
    RDD
  58. def keyBy[K](f: ((T, U)) ⇒ K): RDD[(K, (T, U))]

    Permalink
    Definition Classes
    RDD
  59. def localCheckpoint(): FilteredCartesianRDD.this.type

    Permalink
    Definition Classes
    RDD
  60. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  61. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  62. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  63. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  64. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  65. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  66. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  67. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  68. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  69. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  70. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  71. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  72. def map[U](f: ((T, U)) ⇒ U)(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Definition Classes
    RDD
  73. def mapPartitions[U](f: (Iterator[(T, U)]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Definition Classes
    RDD
  74. def mapPartitionsWithIndex[U](f: (Int, Iterator[(T, U)]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Definition Classes
    RDD
  75. def max()(implicit ord: Ordering[(T, U)]): (T, U)

    Permalink
    Definition Classes
    RDD
  76. val metardd1: RDD[V]

    Permalink
  77. val metardd2: RDD[V]

    Permalink
  78. def min()(implicit ord: Ordering[(T, U)]): (T, U)

    Permalink
    Definition Classes
    RDD
  79. var name: String

    Permalink
    Definition Classes
    RDD
  80. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  81. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  82. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  83. val numPartitionsInRdd2: Int

    Permalink
  84. def parent[U](j: Int)(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Attributes
    protected[org.apache.spark]
    Definition Classes
    RDD
  85. val partitioner: Option[Partitioner]

    Permalink
    Definition Classes
    RDD
  86. final def partitions: Array[Partition]

    Permalink
    Definition Classes
    RDD
  87. def persist(): FilteredCartesianRDD.this.type

    Permalink
    Definition Classes
    RDD
  88. def persist(newLevel: StorageLevel): FilteredCartesianRDD.this.type

    Permalink
    Definition Classes
    RDD
  89. def pipe(command: Seq[String], env: Map[String, String], printPipeContext: ((String) ⇒ Unit) ⇒ Unit, printRDDElement: ((T, U), (String) ⇒ Unit) ⇒ Unit, separateWorkingDir: Boolean, bufferSize: Int, encoding: String): RDD[String]

    Permalink
    Definition Classes
    RDD
  90. def pipe(command: String, env: Map[String, String]): RDD[String]

    Permalink
    Definition Classes
    RDD
  91. def pipe(command: String): RDD[String]

    Permalink
    Definition Classes
    RDD
  92. final def preferredLocations(split: Partition): Seq[String]

    Permalink
    Definition Classes
    RDD
  93. def randomSplit(weights: Array[Double], seed: Long): Array[RDD[(T, U)]]

    Permalink
    Definition Classes
    RDD
  94. var rdd1: RDD[T]

    Permalink

    RDD of elements on the left side of the cartisian join

  95. var rdd2: RDD[U]

    Permalink

    RDD of elements on the right side of the cartisian join

  96. def reduce(f: ((T, U), (T, U)) ⇒ (T, U)): (T, U)

    Permalink
    Definition Classes
    RDD
  97. def repartition(numPartitions: Int)(implicit ord: Ordering[(T, U)]): RDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  98. def sample(withReplacement: Boolean, fraction: Double, seed: Long): RDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  99. def saveAsObjectFile(path: String): Unit

    Permalink
    Definition Classes
    RDD
  100. def saveAsTextFile(path: String, codec: Class[_ <: CompressionCodec]): Unit

    Permalink
    Definition Classes
    RDD
  101. def saveAsTextFile(path: String): Unit

    Permalink
    Definition Classes
    RDD
  102. def setName(_name: String): FilteredCartesianRDD.this.type

    Permalink
    Definition Classes
    RDD
  103. def sortBy[K](f: ((T, U)) ⇒ K, ascending: Boolean, numPartitions: Int)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  104. def sparkContext: SparkContext

    Permalink
    Definition Classes
    RDD
  105. def subtract(other: RDD[(T, U)], p: Partitioner)(implicit ord: Ordering[(T, U)]): RDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  106. def subtract(other: RDD[(T, U)], numPartitions: Int): RDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  107. def subtract(other: RDD[(T, U)]): RDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  108. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  109. def take(num: Int): Array[(T, U)]

    Permalink
    Definition Classes
    RDD
  110. def takeOrdered(num: Int)(implicit ord: Ordering[(T, U)]): Array[(T, U)]

    Permalink
    Definition Classes
    RDD
  111. def takeSample(withReplacement: Boolean, num: Int, seed: Long): Array[(T, U)]

    Permalink
    Definition Classes
    RDD
  112. def toDebugString: String

    Permalink
    Definition Classes
    RDD
  113. def toJavaRDD(): JavaRDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  114. def toLocalIterator: Iterator[(T, U)]

    Permalink
    Definition Classes
    RDD
  115. def toString(): String

    Permalink
    Definition Classes
    RDD → AnyRef → Any
  116. def top(num: Int)(implicit ord: Ordering[(T, U)]): Array[(T, U)]

    Permalink
    Definition Classes
    RDD
  117. def treeAggregate[U](zeroValue: U)(seqOp: (U, (T, U)) ⇒ U, combOp: (U, U) ⇒ U, depth: Int)(implicit arg0: ClassTag[U]): U

    Permalink
    Definition Classes
    RDD
  118. def treeReduce(f: ((T, U), (T, U)) ⇒ (T, U), depth: Int): (T, U)

    Permalink
    Definition Classes
    RDD
  119. def union(other: RDD[(T, U)]): RDD[(T, U)]

    Permalink
    Definition Classes
    RDD
  120. def unpersist(blocking: Boolean): FilteredCartesianRDD.this.type

    Permalink
    Definition Classes
    RDD
  121. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  122. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  123. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  124. def zip[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[((T, U), U)]

    Permalink
    Definition Classes
    RDD
  125. def zipPartitions[B, C, D, V](rdd2: RDD[B], rdd3: RDD[C], rdd4: RDD[D])(f: (Iterator[(T, U)], Iterator[B], Iterator[C], Iterator[D]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[D], arg3: ClassTag[V]): RDD[V]

    Permalink
    Definition Classes
    RDD
  126. def zipPartitions[B, C, D, V](rdd2: RDD[B], rdd3: RDD[C], rdd4: RDD[D], preservesPartitioning: Boolean)(f: (Iterator[(T, U)], Iterator[B], Iterator[C], Iterator[D]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[D], arg3: ClassTag[V]): RDD[V]

    Permalink
    Definition Classes
    RDD
  127. def zipPartitions[B, C, V](rdd2: RDD[B], rdd3: RDD[C])(f: (Iterator[(T, U)], Iterator[B], Iterator[C]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[V]): RDD[V]

    Permalink
    Definition Classes
    RDD
  128. def zipPartitions[B, C, V](rdd2: RDD[B], rdd3: RDD[C], preservesPartitioning: Boolean)(f: (Iterator[(T, U)], Iterator[B], Iterator[C]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[V]): RDD[V]

    Permalink
    Definition Classes
    RDD
  129. def zipPartitions[B, V](rdd2: RDD[B])(f: (Iterator[(T, U)], Iterator[B]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[V]): RDD[V]

    Permalink
    Definition Classes
    RDD
  130. def zipPartitions[B, V](rdd2: RDD[B], preservesPartitioning: Boolean)(f: (Iterator[(T, U)], Iterator[B]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[V]): RDD[V]

    Permalink
    Definition Classes
    RDD
  131. def zipWithIndex(): RDD[((T, U), Long)]

    Permalink
    Definition Classes
    RDD
  132. def zipWithUniqueId(): RDD[((T, U), Long)]

    Permalink
    Definition Classes
    RDD

Inherited from RDD[(T, U)]

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped