Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas how to get multiple columns of grouped result

I have a pandas dataframe(named df) as follows:

id, a,  b,  c
1, 10, 10, 10
1, 20, 20, 20
2, 10, 10, 10
2, 20, 20, 20
3, 10, 10, 10
3, 20, 20, 20

I need to get result using multiple columns within each group.

grouped = df.groupby('id')
grouped['a','b','c'].apply(lambda x,y,z:x*y+z)

But, the second line has error:

KeyError: ('a', 'b', 'c').

How to get this?

like image 908
seizetheday Avatar asked Mar 26 '15 07:03

seizetheday


1 Answers

Your attempt raises a KeyError because it's not able to resolve this into a column selection like you normally think it does, whilst df.groupby('id')['a'].head() works, df.groupby('id')['a','b'].head() will also raise a KeyError. To select the columns of interest you need to supply a list of the columns of interest to the subsript operator like so:

In [163]:

df.groupby('id')[['a','b','c']].apply(lambda x: x['a']*x['b']*x['c'])
Out[163]:
id   
1   0    1000
    1    8000
2   2    1000
    3    8000
3   4    1000
    5    8000
dtype: int64

EDIT

To further illuminate why it would normally seem sensible to perform column selection the way you did, if we refer to the docs, we see that

df.groupby('id')['a']

is syntactic sugar for the more verbose:

df['a'].groupby('id')

So if we try this:

df['a','b']

this will also raise a KeyError, whilst this does not:

df[['a','b']]
like image 185
EdChum Avatar answered Oct 08 '22 20:10

EdChum