Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Explanation of lambda function inside flatMap function: rdd.flatMap(lambda x: map(lambda e: (x[0], e), x[1]))?

I had the exact same question as the question found at this link: Spark: Expansion of RDD(Key, List) to RDD(Key, Value) and the answer did turn out to be correct.

The question was to turn an RDD such as:

(1, List(1, 2, 3))

into

(1,1)
(1,2)
(1,3)

However, I would really like to understand what the lambda functions are doing so that I'm not just blindly copying and pasting. Could anyone please explain how this is working?

like image 998
mic9154 Avatar asked Sep 17 '25 20:09

mic9154


1 Answers

In rdd.flatMap(lambda x: map(lambda e: (x[0], e), x[1])) the function:

 map(lambda e: (x[0], e), x[1])

is the same as the following list comprehension:

[ (x[0], e) for e in x[1] ]

which, for the example data, yields a list of tuples (1, 1), (1, 2) and (1, 3), you then take flatMap to convert each item onto their own RDD elements.

For this particular question, it's simpler to just use flatMapValues:

rdd.flatMapValues(lambda x:x).collect()
#[(1, 1), (1, 2), (1, 3)]
like image 171
jxc Avatar answered Sep 19 '25 12:09

jxc