I have a pandas dataframe(named df) as follows:
id, a, b, c
1, 10, 10, 10
1, 20, 20, 20
2, 10, 10, 10
2, 20, 20, 20
3, 10, 10, 10
3, 20, 20, 20
I need to get result using multiple columns within each group.
grouped = df.groupby('id')
grouped['a','b','c'].apply(lambda x,y,z:x*y+z)
But, the second line has error:
KeyError: ('a', 'b', 'c').
How to get this?
Your attempt raises a KeyError
because it's not able to resolve this into a column selection like you normally think it does, whilst df.groupby('id')['a'].head()
works, df.groupby('id')['a','b'].head()
will also raise a KeyError
. To select the columns of interest you need to supply a list of the columns of interest to the subsript operator like so:
In [163]:
df.groupby('id')[['a','b','c']].apply(lambda x: x['a']*x['b']*x['c'])
Out[163]:
id
1 0 1000
1 8000
2 2 1000
3 8000
3 4 1000
5 8000
dtype: int64
EDIT
To further illuminate why it would normally seem sensible to perform column selection the way you did, if we refer to the docs, we see that
df.groupby('id')['a']
is syntactic sugar for the more verbose:
df['a'].groupby('id')
So if we try this:
df['a','b']
this will also raise a KeyError
, whilst this does not:
df[['a','b']]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With