Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I filter a Pandas GroupBy object and obtain a GroupBy object back?

Tags:

python

pandas

When performing filter on the result of a Pandas groupby operation, it returns a dataframe. But supposing that I want to perform further group computations, I have to call groupby again, which seems sort of round about. Is there a more idiomatic way of doing this?

EDIT:

To illustrate what I'm talking about:

We shamelessly steal a toy dataframe from the Pandas docs, and group:

>>> dff = pd.DataFrame({'A': np.arange(8), 'B': list('aabbbbcc')})
>>> grouped = dff.groupby('B')
>>> type(grouped)
<class 'pandas.core.groupby.DataFrameGroupBy'>

This returns a groupby object over which we can iterate, perform group-wise operations, etc. But if we filter:

>>> filtered = grouped.filter(lambda x: len(x) > 2)
>>> type(filtered)
<class 'pandas.core.frame.DataFrame'>

We get back a dataframe. Is there a nice idiomatic way of obtaining the filtered groups back, instead of just the original rows which belonged to the filtered groups?

like image 394
Rob Lachlan Avatar asked Mar 06 '16 20:03

Rob Lachlan


1 Answers

If you want to combine a filter and an aggregate, the best way I can think of would be to combine your filter and aggregate using a ternary if inside apply, returning None for filtered groups, and then dropna to remove these rows from your final result:

grouped.apply(lambda x: x.sum() if len(x) > 2 else None).dropna()

If you're wanting to iterate through the groups, say to join them back together, you could use a generator comprehension

pd.concat(g for i,g in grouped if len(g)>2)

Ultimately I think it would be better if groupby.filter had an option to return a groupby object.

like image 56
maxymoo Avatar answered Sep 19 '22 15:09

maxymoo