Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to duplicate RDD into multiple RDDs?

Is it possible to duplicate a RDD into two or several RDDs ?

I want to use the cassandra-spark driver and save a RDD into a Cassandra table, and, in addition, keep going with more calculations (and eventually save the result to Cassandra as well).

like image 461
AlonL Avatar asked Jan 19 '15 12:01

AlonL


1 Answers

RDDs are immutable and transformations on RDDs create new RDDs. Therefore, it's not necessary to create copies of an RDD to apply different operations.

You could save the base RDD to secondary storage and further apply operations to it.

This is perfectly OK:

val rdd = ???
val base = rdd.byKey(...)
base.saveToCassandra(ks,table)
val processed = byKey.map(...).reduceByKey(...)
processed.saveToCassandra(ks,processedTable)
val analyzed = base.map(...).join(suspectsRDD).reduceByKey(...)
analyzed.saveAsTextFile("./path/to/save")
like image 58
maasg Avatar answered Sep 28 '22 22:09

maasg