Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby.transform doesn't work in dask dataframe

i'm using the following dask.dataframe AID:

   AID FID  ANumOfF
0    1   X        1
1    1   Y        5
2    2   Z        6
3    2   A        1
4    2   X       11
5    2   B       18

I know in a pandas dataframe I could use:

AID.groupby('AID')['ANumOfF'].transform('sum')

to get:

0     6
1     6
2    36
3    36
4    36
5    36

I want to use the same with dask.dataframes which usually uses same functions as a pandas dataframe, but in this instance gives me the following error:

AttributeError: 'SeriesGroupBy' object has no attribute 'transform'

It could either be one of two things, either that dask doesn't support it, or it's because I'm using python 3?

I tried the following code:

AID.groupby('AID')['ANumOfF'].sum()

but that just gives me the sum of each group like this:

AID
1     6
2    36

I need it to be as the above where a sum is repeated in each row. My question is, if transform isn't supported, is there another way I could achieve the same result?

like image 321
BKS Avatar asked Apr 04 '17 12:04

BKS


1 Answers

I think you can use join:

s = AID.groupby('AID')['ANumOfF'].sum()
AID = AID.set_index('AID').drop('ANumOfF', axis=1).join(s).reset_index()
print (AID)
   AID FID  ANumOfF
0    1   X        6
1    1   Y        6
2    2   Z       36
3    2   A       36
4    2   X       36
5    2   B       36

Or faster solution with map by aggregate Series or dict:

s = AID.groupby('AID')['ANumOfF'].sum()
#a bit faster
#s = AID.groupby('AID')['ANumOfF'].sum().to_dict()
AID['ANumOfF'] = AID['AID'].map(s)
print (AID)
   AID FID  ANumOfF
0    1   X        6
1    1   Y        6
2    2   Z       36
3    2   A       36
4    2   X       36
5    2   B       36
like image 97
jezrael Avatar answered Oct 05 '22 00:10

jezrael