i'm using the following dask.dataframe AID
:
AID FID ANumOfF
0 1 X 1
1 1 Y 5
2 2 Z 6
3 2 A 1
4 2 X 11
5 2 B 18
I know in a pandas dataframe I could use:
AID.groupby('AID')['ANumOfF'].transform('sum')
to get:
0 6
1 6
2 36
3 36
4 36
5 36
I want to use the same with dask.dataframes which usually uses same functions as a pandas dataframe, but in this instance gives me the following error:
AttributeError: 'SeriesGroupBy' object has no attribute 'transform'
It could either be one of two things, either that dask doesn't support it, or it's because I'm using python 3?
I tried the following code:
AID.groupby('AID')['ANumOfF'].sum()
but that just gives me the sum of each group like this:
AID
1 6
2 36
I need it to be as the above where a sum is repeated in each row. My question is, if transform isn't supported, is there another way I could achieve the same result?
I think you can use join
:
s = AID.groupby('AID')['ANumOfF'].sum()
AID = AID.set_index('AID').drop('ANumOfF', axis=1).join(s).reset_index()
print (AID)
AID FID ANumOfF
0 1 X 6
1 1 Y 6
2 2 Z 36
3 2 A 36
4 2 X 36
5 2 B 36
Or faster solution with map
by aggregate Series
or dict
:
s = AID.groupby('AID')['ANumOfF'].sum()
#a bit faster
#s = AID.groupby('AID')['ANumOfF'].sum().to_dict()
AID['ANumOfF'] = AID['AID'].map(s)
print (AID)
AID FID ANumOfF
0 1 X 6
1 1 Y 6
2 2 Z 36
3 2 A 36
4 2 X 36
5 2 B 36
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With