Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge list of lists in pySpark RDD

I have lists of tuples that I want to combine into one list. I've been able to process the data using lambdas and list comprehension to where I'm close to being able to use reduceByKey but not sure how to merge the lists. So the format...

[[(0, 14), (0, 24)], [(1, 19), (1, 50)], ...]

And I would like it to be this way....

[(0, 14), (0, 24), (1, 19), (1, 50), ...]

Code that got me to where I need to be...

test = test.map(lambda x: (x[1], [e * local[x[1]] for e in x[0]]))
test = test.map(lambda x: [(x[0], y) for y in x[1]])

But not sure from there what to do to merge the lists

like image 907
cpd1 Avatar asked Mar 07 '23 21:03

cpd1


1 Answers

You can do,

test = test.flatMap(identity)

or

test = test.flatMap(lambda list: list)
like image 137
mrsrinivas Avatar answered Mar 20 '23 15:03

mrsrinivas