Is it possible to duplicate a RDD into two or several RDDs ?
I want to use the cassandra-spark driver and save a RDD into a Cassandra table, and, in addition, keep going with more calculations (and eventually save the result to Cassandra as well).
RDD
s are immutable and transformations on RDDs create new RDDs. Therefore, it's not necessary to create copies of an RDD to apply different operations.
You could save the base RDD to secondary storage and further apply operations to it.
This is perfectly OK:
val rdd = ???
val base = rdd.byKey(...)
base.saveToCassandra(ks,table)
val processed = byKey.map(...).reduceByKey(...)
processed.saveToCassandra(ks,processedTable)
val analyzed = base.map(...).join(suspectsRDD).reduceByKey(...)
analyzed.saveAsTextFile("./path/to/save")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With