Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dask Dataframe split column of list into multiple columns

The same task in Pandas can be easily done with

import pandas as pd
df = pd.DataFrame({"lists":[[i, i+1] for i in range(10)]})
df[['left','right']] = pd.DataFrame([x for x in df.lists])

But I can't figure out how to do something similar with a dask.dataframe

Update

So far I found this workaround

ddf = dd.from_pandas(df, npartitions=2)
ddf["left"] = ddf.apply(lambda x: x["lists"][0], axis=1, meta=pd.Series())
ddf["right"] = ddf.apply(lambda x: x["lists"][1], axis=1, meta=pd.Series())

I'm wondering if there is another way to procede.

like image 249
rpanai Avatar asked Jan 04 '23 17:01

rpanai


1 Answers

You could achieve this using assign:

ddf = ddf.assign(left=ddf.lists.map(lambda x: x[0]),
                 right=ddf.lists.map(lambda x: x[1]))

e.g.,

ddf.compute()


     lists  left  right
0   [0, 1]     0      1
1   [1, 2]     1      2
2   [2, 3]     2      3
3   [3, 4]     3      4
4   [4, 5]     4      5
5   [5, 6]     5      6
6   [6, 7]     6      7
7   [7, 8]     7      8
8   [8, 9]     8      9
9  [9, 10]     9     10

An alternative way of phrasing this (see comments, below) might be

ddf = ddf.assign(**{k: ddf.lists.map(lambda x, i=i: x[i]) 
                 for i, k in enumerate(['left', 'right'])})
like image 186
mdurant Avatar answered Jan 27 '23 07:01

mdurant