Say I have this example job (in Groovy w/ Java API):
def set1 = []
def set2 = []
0.upto(10) { set1 << it }
8.upto(20) { set2 << it }
def rdd1 = context.parallelize(set1)
def rdd2 = context.parallelize(set2)
//What next?
How do I get a set that is the delta between the two? I know that union can create a RDD that has all of the data in those RDDs, but how do I do the opposite of that?
If you just want a set subtraction subtract would be an answer. If you want the "outer" collection try:
rdd1.subtract(rdd2).union(rdd2.subtract(rdd1))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With