Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark: JavaRDD<Tuple2> to JavaPairRDD<>

I have a JavaRDD<Tuple2<String, String>> and need to transform it to JavaPairRDD<String, String>. Currently I am doing it by simply writing map function that just returns the input tuple as is. But I wonder if there is a better way?

like image 596
YuliaSh. Avatar asked Nov 19 '14 18:11

YuliaSh.


People also ask

What is JavaPairRDD spark?

JavaPairRDD is there to declare the contract to the developer that a Key and Value is required. Regular JavaRDD can be used for operations which don't require an explicit Key field. These operations are generic operations on arbitrary element types.

What is spark Tuple2?

In the scenario above, Spark creates a Tuple2 class that groups two elements together (in Scala you just create the tuple, but in Java it must be very explicit that it is a tuple that contains two elements). The call() method then returns a tuple that contains the word and the value of 1.


2 Answers

JavaPairRDD.fromJavaRDD(rdd) is one of solutions

like image 75
YuliaSh. Avatar answered Sep 17 '22 15:09

YuliaSh.


Try this example:

JavaRDD<Tuple2<Integer, String>> mutate = mutateFunction(rdd_world); //goes to a method that generates the RDD with a Tuple2 from a rdd_world RDD
JavaPairRDD<Integer,  String> pairs = JavaPairRDD.fromJavaRDD(mutate);
like image 20
3xCh1_23 Avatar answered Sep 18 '22 15:09

3xCh1_23