When performing filter on the result of a Pandas groupby operation, it returns a dataframe. But supposing that I want to perform further group computations, I have to call groupby again, which seems sort of round about. Is there a more idiomatic way of doing this?
EDIT:
To illustrate what I'm talking about:
We shamelessly steal a toy dataframe from the Pandas docs, and group:
>>> dff = pd.DataFrame({'A': np.arange(8), 'B': list('aabbbbcc')})
>>> grouped = dff.groupby('B')
>>> type(grouped)
<class 'pandas.core.groupby.DataFrameGroupBy'>
This returns a groupby object over which we can iterate, perform group-wise operations, etc. But if we filter:
>>> filtered = grouped.filter(lambda x: len(x) > 2)
>>> type(filtered)
<class 'pandas.core.frame.DataFrame'>
We get back a dataframe. Is there a nice idiomatic way of obtaining the filtered groups back, instead of just the original rows which belonged to the filtered groups?
If you want to combine a filter and an aggregate, the best way I can think of would be to combine your filter and aggregate using a ternary if
inside apply
, returning None
for filtered groups, and then dropna
to remove these rows from your final result:
grouped.apply(lambda x: x.sum() if len(x) > 2 else None).dropna()
If you're wanting to iterate through the groups, say to join them back together, you could use a generator comprehension
pd.concat(g for i,g in grouped if len(g)>2)
Ultimately I think it would be better if groupby.filter
had an option to return a groupby object.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With