How can I apply a list function to a grouped pandas dataframe which is not aggregated using apply but rather transform?
For me the following fails with: KeyError: "None of [Index(['v1', 'v2'], dtype='object')] are in the [index]"
import pandas as pd
df = pd.DataFrame({'key':[1,1,1,2,3,2], 'v1': [1,4,6,7,4,9], 'v2':[0.3, 0.6, 0.4, .1, .2, .8]})
display(df)
def list_function(x):
#display(x)
all_values = x[['v1','v2']].drop_duplicates()
#display(all_values)
#result = all_values.to_json()
result = all_values.values
return result
display(df.groupby(['key']).apply(list_function))
df['list_result'] = df.groupby(['key']).transform(list_function)
df
NOTICE: I know that a join would be possible with the aggregated data, but in this particular case I would prefer not having to do the JOIN afterwards.
It is not possible, in pandas GroupBy.transform and also GroupBy.agg working with each column separately, so cannot select by multiple columns names like you need.
It is possible only by GroupBy.apply.
So instead transform is possible use Series.map if match one column, for multiple column use DataFrame.join:
df['list_result'] = df['key'].map(df.groupby(['key']).apply(list_function))
print (df)
key v1 v2 list_result
0 1 1 0.3 [[1.0, 0.3], [4.0, 0.6], [6.0, 0.4]]
1 1 4 0.6 [[1.0, 0.3], [4.0, 0.6], [6.0, 0.4]]
2 1 6 0.4 [[1.0, 0.3], [4.0, 0.6], [6.0, 0.4]]
3 2 7 0.1 [[7.0, 0.1], [9.0, 0.8]]
4 3 4 0.2 [[4.0, 0.2]]
5 2 9 0.8 [[7.0, 0.1], [9.0, 0.8]]
#added one column for match by 2 columns sample
df['new'] = 1
s = df.groupby(['key', 'new']).apply(list_function)
df = df.join(s.rename('list_result'), on=['key','new'])
print (df)
key v1 v2 new list_result
0 1 1 0.3 1 [[1.0, 0.3], [4.0, 0.6], [6.0, 0.4]]
1 1 4 0.6 1 [[1.0, 0.3], [4.0, 0.6], [6.0, 0.4]]
2 1 6 0.4 1 [[1.0, 0.3], [4.0, 0.6], [6.0, 0.4]]
3 2 7 0.1 1 [[7.0, 0.1], [9.0, 0.8]]
4 3 4 0.2 1 [[4.0, 0.2]]
5 2 9 0.8 1 [[7.0, 0.1], [9.0, 0.8]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With