dask dataframe looks like this:
A B C D
1 foo xx this
1 foo xx belongs
1 foo xx together
4 bar xx blubb
i want to groupy by columns A,B,C and join the strings from D with a blank between, to get
A B C D
1 foo xx this belongs together
4 bar xx blubb
i see how to do this with pandas:
df_grouped = df.groupby(['A','B','C'])['D'].agg(' '.join).reset_index()
how can this be achieved with dask?
ddf = ddf.groupby(['A','B','C'])['D'].apply(lambda row: ' '.join(row)).reset_index()
ddf.compute()
Output:
Out[75]:
A B C D
0 1 foo xx this belongs together
0 4 bar xx blubb
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With