Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RDD.union vs SparkContex.union

Tags:

apache-spark

What is the difference between

reduce(lambda x,y: x.union(y), myRDDlist)

which calls RDD.union and

sc.union(myRDDlist)

which calls SparkContext.union?

Do they compile to the same code?

like image 875
sds Avatar asked Mar 20 '15 18:03

sds


1 Answers

SparkContext.union and RDD.union are equivalent, if you have two RDDs.

Reducing over a list of RDDs and calling RDD.union will result in several nested UnionRDDs (referencing each other), where the call to SparkContext.union will result in only a single UnionRDD.

like image 55
DPM Avatar answered Oct 06 '22 20:10

DPM