I had the exact same question as the question found at this link: Spark: Expansion of RDD(Key, List) to RDD(Key, Value) and the answer did turn out to be correct.
The question was to turn an RDD such as:
(1, List(1, 2, 3))
into
(1,1)
(1,2)
(1,3)
However, I would really like to understand what the lambda functions are doing so that I'm not just blindly copying and pasting. Could anyone please explain how this is working?
In rdd.flatMap(lambda x: map(lambda e: (x[0], e), x[1]))
the function:
map(lambda e: (x[0], e), x[1])
is the same as the following list comprehension:
[ (x[0], e) for e in x[1] ]
which, for the example data, yields a list of tuples (1, 1), (1, 2) and (1, 3), you then take flatMap to convert each item onto their own RDD elements.
For this particular question, it's simpler to just use flatMapValues:
rdd.flatMapValues(lambda x:x).collect()
#[(1, 1), (1, 2), (1, 3)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With