Explanation of lambda function inside flatMap function: rdd.flatMap(lambda x: map(lambda e: (x[0], e), x[1]))?

Question

I had the exact same question as the question found at this link: Spark: Expansion of RDD(Key, List) to RDD(Key, Value) and the answer did turn out to be correct.

The question was to turn an RDD such as:

(1, List(1, 2, 3))

into

(1,1)
(1,2)
(1,3)

However, I would really like to understand what the lambda functions are doing so that I'm not just blindly copying and pasting. Could anyone please explain how this is working?

jxc · Accepted Answer

In rdd.flatMap(lambda x: map(lambda e: (x[0], e), x[1])) the function:

 map(lambda e: (x[0], e), x[1])

is the same as the following list comprehension:

[ (x[0], e) for e in x[1] ]

which, for the example data, yields a list of tuples (1, 1), (1, 2) and (1, 3), you then take flatMap to convert each item onto their own RDD elements.

For this particular question, it's simpler to just use flatMapValues:

rdd.flatMapValues(lambda x:x).collect()
#[(1, 1), (1, 2), (1, 3)]

Explanation of lambda function inside flatMap function: rdd.flatMap(lambda x: map(lambda e: (x[0], e), x[1]))?

Tags:

python

lambda

apache-spark

pyspark

mic9154

1 Answers

jxc

Recent Activity

Donate For Us

Explanation of lambda function inside flatMap function: rdd.flatMap(lambda x: map(lambda e: (x[0], e), x[1]))?

Tags:

python

lambda

apache-spark

pyspark

mic9154

1 Answers

jxc

Related questions

Recent Activity

Donate For Us