I have a JavaRDD<Tuple2<String, String>>
and need to transform it to JavaPairRDD<String, String>
. Currently I am doing it by simply writing map function that just returns the input tuple as is. But I wonder if there is a better way?
JavaPairRDD is there to declare the contract to the developer that a Key and Value is required. Regular JavaRDD can be used for operations which don't require an explicit Key field. These operations are generic operations on arbitrary element types.
In the scenario above, Spark creates a Tuple2 class that groups two elements together (in Scala you just create the tuple, but in Java it must be very explicit that it is a tuple that contains two elements). The call() method then returns a tuple that contains the word and the value of 1.
JavaPairRDD.fromJavaRDD(rdd) is one of solutions
Try this example:
JavaRDD<Tuple2<Integer, String>> mutate = mutateFunction(rdd_world); //goes to a method that generates the RDD with a Tuple2 from a rdd_world RDD
JavaPairRDD<Integer, String> pairs = JavaPairRDD.fromJavaRDD(mutate);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With