Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark: FlatMapValues query

I'm reading the Learning Spark book and couldn't understand the following pair rdd transformation.

rdd.flatMapValues(x => (x to 5))

It is applied on an rdd {(1,2),(3,4),(3,6)} and the output of the transformation is {(1,2),(1,3),(1,4),(1,5),(3,4),(3,5)}

Can someone please explain this.

like image 211
Vinay Avatar asked May 18 '16 14:05

Vinay


People also ask

What does Flatmapvalues do?

Pass each value in the key-value pair RDD through a flatMap function without changing the keys; this also retains the original RDD's partitioning.

How do I create a key-value pair with Spark?

Creating a pair RDD using the first word as the key in Java. PairFunction < String , String , String > keyData = new PairFunction < String , String , String >() { public Tuple2 < String , String > call ( String x ) { return new Tuple2 ( x . split ( " " )[ 0 ], x ); } }; JavaPairRDD < String , String > pairs = lines .


1 Answers

flatMapValues method is a combination of flatMap and mapValues.

Let's start with the given rdd.

val sampleRDD = sc.parallelize(Array((1,2),(3,4),(3,6)))

mapValues maps the values while keeping the keys.

For example, sampleRDD.mapValues(x => x to 5) returns

Array((1,Range(2, 3, 4, 5)), (3,Range(4, 5)), (3,Range()))

notice that for key-value pair (3, 6), it produces (3,Range()) since 6 to 5 produces an empty collection of values.


flatMap "breaks down" collections into the elements of the collection. You can search for more accurate description of flatMap online like here and here.

For example,

given val rdd2 = sampleRDD.mapValues(x => x to 5), if we do rdd2.flatMap(x => x), you will get

Array((1,2),(1,3),(1,4),(1,5),(3,4),(3,5)).

That is, for every element in the collection in each key, we create a (key, element) pair.

Also notice that (3, Range()) does not produce any additional key element pair since the sequence is empty.

now combining flatMap and mapValues, you get flatMapValues.

like image 90
jtitusj Avatar answered Oct 09 '22 07:10

jtitusj