Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby multiple columns and aggregation with dask

dask dataframe looks like this:

A     B     C     D
1     foo   xx    this
1     foo   xx    belongs
1     foo   xx    together
4     bar   xx    blubb

i want to groupy by columns A,B,C and join the strings from D with a blank between, to get

A     B     C     D
1     foo   xx    this belongs together
4     bar   xx    blubb

i see how to do this with pandas:

df_grouped = df.groupby(['A','B','C'])['D'].agg(' '.join).reset_index()

how can this be achieved with dask?

like image 414
bucky Avatar asked Oct 12 '25 12:10

bucky


1 Answers

ddf = ddf.groupby(['A','B','C'])['D'].apply(lambda row: ' '.join(row)).reset_index()
ddf.compute()

Output:

Out[75]: 
   A    B   C                      D
0  1  foo  xx  this belongs together
0  4  bar  xx                  blubb
like image 141
KRKirov Avatar answered Oct 14 '25 11:10

KRKirov



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!