What is the difference between
reduce(lambda x,y: x.union(y), myRDDlist)
which calls RDD.union and
sc.union(myRDDlist)
which calls SparkContext.union?
Do they compile to the same code?
SparkContext.union and RDD.union are equivalent, if you have two RDDs.
Reducing over a list of RDDs and calling RDD.union will result in several nested UnionRDDs (referencing each other), where the call to SparkContext.union will result in only a single UnionRDD.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With