Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby + transform and multiple columns

To obtain results executed on groupby-data with the same level of detail as the original DataFrame (same observation count) I have used the transform function.

Example: Original dataframe

name, year, grade
Jack, 2010, 6
Jack, 2011, 7
Rosie, 2010, 7
Rosie, 2011, 8

After groupby transform

name, year, grade, average grade
Jack, 2010, 6, 6.5
Jack, 2011, 7, 6.5
Rosie, 2010, 7, 7.5
Rosie, 2011, 8, 7.5

However, with more advanced functions based on multiple columns things get more complicated. What puzzles me is that I seem to be unable to access multiple columns in a groupby-transform combination.

df = pd.DataFrame({'a':[1,2,3,4,5,6],
               'b':[1,2,3,4,5,6],
               'c':['q', 'q', 'q', 'q', 'w', 'w'],  
               'd':['z','z','z','o','o','o']})

def f(x):
 y=sum(x['a'])+sum(x['b'])
 return(y)

df['e'] = df.groupby(['c','d']).transform(f)

Gives me:

KeyError: ('a', 'occurred at index a')

Though I know that following does work:

df.groupby(['c','d']).apply(f)

What causes this behavior and how can I obtain something like this:

a   b   c   d   e
1   1   q   z   12
2   2   q   z   12
3   3   q   z   12
4   4   q   o   8
5   5   w   o   22
6   6   w   o   22
like image 627
Willem Avatar asked Nov 08 '18 16:11

Willem


People also ask

How to group by a column in a pandas Dataframe?

First lets see how to group by a single column in a Pandas DataFrame you can use the next syntax: df.groupby(['publication']) Copy. In order to group by multiple columns you need to use the next syntax: df.groupby(['publication', 'date_m']) Copy. The columns should be provided as a list to the groupby method.

How to apply multiple aggregation functions to a groupby in pandas?

Applying multiple aggregation functions to a groupby is done by method: agg. Note: that another function aggregate exists which and agg is an alias for it. The functions can be passed as a list. The available aggregation functions for group by in Pandas are:

How to group and aggregate by multiple columns of a Dataframe?

Often you may want to group and aggregate by multiple columns of a pandas DataFrame. Fortunately this is easy to do using the pandas .groupby () and .agg () functions. This tutorial explains several examples of how to use these functions in practice.

How to inspect a pandas groupby object and see the splitting?

One useful way to inspect a Pandas GroupBy object and see the splitting in action is to iterate over it. This is implemented in DataFrameGroupBy.__iter__ () and produces an iterator of (group, DataFrame) pairs for DataFrames:


2 Answers

for this particular case you could do:

g = df.groupby(['c', 'd'])

df['e'] = g.a.transform('sum') + g.b.transform('sum')

df
# outputs

   a  b  c  d   e
0  1  1  q  z  12
1  2  2  q  z  12
2  3  3  q  z  12
3  4  4  q  o   8
4  5  5  w  o  22
5  6  6  w  o  22

if you can construct the final result by a linear combination of the independent transforms on the same groupby, this method would work.

otherwise, you'd use a groupby-apply and then merge back to the original df.

example:

_ = df.groupby(['c','d']).apply(lambda x: sum(x.a+x.b)).rename('e').reset_index()
df.merge(_, on=['c','d'])
# same output as above.
like image 66
Haleemur Ali Avatar answered Oct 23 '22 11:10

Haleemur Ali


You can use GroupBy + transform with sum twice:

df['e'] = df.groupby(['c', 'd'])[['a', 'b']].transform('sum').sum(1)

print(df)

   a  b  c  d   e
0  1  1  q  z  12
1  2  2  q  z  12
2  3  3  q  z  12
3  4  4  q  o   8
4  5  5  w  o  22
5  6  6  w  o  22
like image 26
jpp Avatar answered Oct 23 '22 10:10

jpp