Filter groups after GroupBy in pandas while keeping the groups

Question

in pandas I want to do: df.groupby('A').filter(lambda x: x.name > 0) - group by column A and then filter groups that have the value of the name non positive. However this canceles the grouping as GroupBy.filter returns DataFrame and thus losing the groupings. I want to do it in this order as it should be less computationaly demanding because filter followed by groupby would walk the DataFrame twice no (first filtering and then grouping)? Also cloning the groups from the grouping (to a dict or something) would lose me the functionality to seamlessly go back to dataframe (like in the example of .filter that you directly get the DataFrame)

Thanks

Example:

df.groupby('A'):

GroupBy object
-1 : [1, 2]
 0 : [3]
 1 : [4,5]

GroupBy.filter(lambda x: x.name >= 0):

GroupBy object
 0 : [3]
 1 : [4,5]

Thanks

Example:

df.groupby('A'):

GroupBy object
-1 : [1, 2]
 0 : [3]
 1 : [4,5]

GroupBy.filter(lambda x: x.name >= 0):

GroupBy object
 0 : [3]
 1 : [4,5]

petsol · Accepted Answer

I think the previous answers propose workarounds, which are maybe useful in your case but doesn't answer the question.

You created groups, and you want to throw out or keep some groups based on group statistics THEN perform some group statistics you actually care for on the groups. This should be possible, and useful in many cases, however, it is not possible now as a chained command (as far as I know) only if you use two identical groupbys consequently.

Let's make a case: Groupby reveals some features that are not filterable on an item level basis (so previous filtering is not an option). For example a group sum. The annoyment in filter is, that it returns a dataframe rather than keeping the grouping and allow you to perform further computations on the groups.

Here is an example:

Let's say you want to group by 'C' and filter on the sums of 'A' in the groups (<700), but in the filtered groups you actually care for the std of the groups. If filter would just be a filter on groups, this would work:

df.groupby(['C']).filter(lambda x:x['A'].sum()<700, combine=False).std()

this doesn't work (note the nonexistent combine=False option on filter), what does is this:

df.groupby(['C']).filter(lambda x:x['A'].sum()<700).groupby(['C']).std()

What filter does is actually filter&combine, which follows the split-apply-combine logic.

David Stevens · Answer

Using groupby doesn't actually aggregate the values in any way. It just creates the groupings, so the filter is essentially filtering on the original dataframe. I don't think you're saving time or computation by grouping first unless name is value achieved by applying a function to the groups.

Therefore I'd recommend something like

df.where(df.name > 0).groupby('A')  # now apply some transformation to the groups

Therefore I'd recommend something like

df.where(df.name > 0).groupby('A')  # now apply some transformation to the groups

Filter groups after GroupBy in pandas while keeping the groups

Tags:

python

pandas

pandas-groupby

Péťa Poliak

2 Answers

petsol

David Stevens

Recent Activity

Donate For Us

Filter groups after GroupBy in pandas while keeping the groups

Tags:

python

pandas

pandas-groupby

Péťa Poliak

2 Answers

petsol

David Stevens

Related questions

Recent Activity

Donate For Us