Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark: How RDD.map/mapToPair work with Java

I have some pairs cw (Integer i, String word) with i number of occurences of word in a text file.

I would like to simply have for each pair a new pair c1 (Integer i, 1) with 1 fixed number.

It seems to be really trivial but I haven't understood how map/mapToPair functions actually work.

JavaPairRDD<Integer, Integer> c1 = cw.map(??? -> new Tuple2<Integer, Integer>(??, 1));

I am working using Java-8.

like image 290
rugrag Avatar asked Dec 09 '16 11:12

rugrag


2 Answers

If I understand you correctly, you have below JavaPairRDD.

JavaPairRDD<Integer, String> cw = ...;

Now you want to create below JavaPairRDD where second value is 1.

JavaPairRDD<Integer, Integer> c1;

In order to get this, first you have to extract JavaRDD from cw JavaPairRDD and for this you will have to call map function like below. We will extract first value from pair.

JavaRDD<Integer> cw1 = cw.map(tuple -> tuple._1());

Now you will create new JavaPairRDD from JavaRDD using mapToPair function like below.

JavaPairRDD<Integer, Integer> c1 = cw1.mapToPair(i -> new Tuple2<Integer, Integer>(i, 1));

In single line you can write it like

JavaPairRDD<Integer, Integer> c1 = cw.map(tuple -> tuple._1()).mapToPair(i -> new Tuple2<Integer, Integer>(i, 1));
like image 193
abaghel Avatar answered Sep 28 '22 07:09

abaghel


This is what you can try:

JavaPairRDD<Integer, Integer> tuples = filtered.mapToPair(
                                            f -> new Tuple2<Integer, Integer>(
                                                       Integer.parseInt(f[0]), 
                                                       Integer.parseInt(f[1])
                                       ));
like image 29
KayV Avatar answered Sep 28 '22 09:09

KayV