What's the difference between join and cogroup in Apache Spark

1 Answers

Let me help you to clarify them, both are common to use and important!

def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]

This is prototype of join, please carefully look at it. For example,

val rdd1 = sc.makeRDD(Array(("A","1"),("B","2"),("C","3")),2) val rdd2 = sc.makeRDD(Array(("A","a"),("C","c"),("D","d")),2)   scala> rdd1.join(rdd2).collect res0: Array[(String, (String, String))] = Array((A,(1,a)), (C,(3,c)))

All keys that will appear in the final result is common to rdd1 and rdd2. This is similar to relation database operation INNER JOIN.

But cogroup is different,

def cogroup[W](other: RDD[(K, W)]): RDD[(K, (Iterable[V], Iterable[W]))]

as one key at least appear in either of the two rdds, it will appear in the final result, let me clarify it:

val rdd1 = sc.makeRDD(Array(("A","1"),("B","2"),("C","3")),2) val rdd2 = sc.makeRDD(Array(("A","a"),("C","c"),("D","d")),2)  scala> var rdd3 = rdd1.cogroup(rdd2).collect res0: Array[(String, (Iterable[String], Iterable[String]))] = Array( (B,(CompactBuffer(2),CompactBuffer())),  (D,(CompactBuffer(),CompactBuffer(d))),  (A,(CompactBuffer(1),CompactBuffer(a))),  (C,(CompactBuffer(3),CompactBuffer(c))) )

This is very similar to relation database operation FULL OUTER JOIN, but instead of flattening the result per line per record, it will give you the iterable interface to you, the following operation is up to you as convenient!

Good Luck!

Spark docs is: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions

187

answered Sep 22 '22 03:09

ashburshui

Related questions
                            
                                scala class constructor parameters
                            
                                How to open SBT Tool Window in Intellij?
                            
                                Why doesn't Scala have static members inside a class?
                            
                                How to reload a class or package in Scala REPL?
                            
                                Scala forall example?
                            
                                How to exclude commons-logging from a scala/sbt/slf4j project?
                            
                                Including null values in an Apache Spark Join
                            
                                Count all occurrences of a char within a string
                            
                                Scala Divide two integers and get a float result
                            
                                How to get around the Scala case class limit of 22 fields?
                            
                                Can a range be matched in Scala?
                            
                                What are the differences between asInstanceOf[T] and (o: T) in Scala?
                            
                                Print the data in ResultSet along with column names
                            
                                What is the difference between = and := in Scala?
                            
                                Filtering a Scala List by type
                            
                                Can't prove that singleton types are singleton types while generating type class instance
                            
                                values, types, kinds,... as an infinite sequence?
                            
                                +- Signs in Generic Declaration in Scala
                            
                                Is it possible to have tuple assignment to variables in Scala? [duplicate]
                            
                                Scala reference equality

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the difference between join and cogroup in Apache Spark

Tags:

scala

apache-spark

miaoiao

People also ask

1 Answers

ashburshui

Recent Activity

Donate For Us