Groupby with User Defined Functions Pandas

I understand that passing a function as a group key calls the function once per index value with the return values being used as the group names. What I can't figure out is how to call the function on column values.

So I can do this:

people = pd.DataFrame(np.random.randn(5, 5),                        columns=['a', 'b', 'c', 'd', 'e'],                       index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis']) def GroupFunc(x):     if len(x) > 3:         return 'Group1'     else:         return 'Group2'  people.groupby(GroupFunc).sum()

This splits the data into two groups, one of which has index values of length 3 or less, and the other with length three or more. But how can I pass one of the column values? So for example if column d value for each index point is greater than 1. I realise I could just do the following:

people.groupby(people.a > 1).sum()

But I want to know how to do this in a user defined function for future reference.

Something like:

def GroupColFunc(x): if x > 1:     return 'Group1' else:     return 'Group2'

But how do I call this? I tried

people.groupby(GroupColFunc(people.a))

and similar variants but this does not work.

How do I pass the column values to the function? How would I pass multiple column values e.g. to group on whether people.a > people.b for example?

How do I use custom function on Groupby pandas?

Simply use the apply method to each dataframe in the groupby object. This is the most straightforward way and the easiest to understand. Notice that the function takes a dataframe as its only argument, so any code within the custom function needs to work on a pandas dataframe.

Can I use group by without aggregate function pandas?

Instead of using groupby aggregation together, we can perform groupby without aggregation which is applicable to aggregate data separately.

To group by a > 1, you can define your function like:

>>> def GroupColFunc(df, ind, col): ...     if df[col].loc[ind] > 1: ...         return 'Group1' ...     else: ...         return 'Group2' ...

An then call it like

>>> people.groupby(lambda x: GroupColFunc(people, x, 'a')).sum()                a         b         c         d        e Group2 -2.384614 -0.762208  3.359299 -1.574938 -2.65963

Or you can do it only with anonymous function:

>>> people.groupby(lambda x: 'Group1' if people['b'].loc[x] > people['a'].loc[x] else 'Group2').sum()                a         b         c         d         e Group1 -3.280319 -0.007196  1.525356  0.324154 -1.002439 Group2  0.895705 -0.755012  1.833943 -1.899092 -1.657191

As said in documentation, you can also group by passing Series providing a label -> group name mapping:

>>> mapping = np.where(people['b'] > people['a'], 'Group1', 'Group2') >>> mapping Joe       Group2 Steve     Group1 Wes       Group2 Jim       Group1 Travis    Group1 dtype: string48 >>> people.groupby(mapping).sum()                a         b         c         d         e Group1 -3.280319 -0.007196  1.525356  0.324154 -1.002439 Group2  0.895705 -0.755012  1.833943 -1.899092 -1.657191

Groupby with User Defined Functions Pandas

Tags:

python

pandas

Woody Pride

People also ask

1 Answers

Roman Pekar

Recent Activity

Donate For Us

Groupby with User Defined Functions Pandas

Tags:

python

pandas

Woody Pride

People also ask

1 Answers

Roman Pekar

Related questions

Recent Activity

Donate For Us