Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering grouped DataFrame in Pandas

I am creating a groupby object from a Pandas DataFrame and want to select out all the groups with > 1 size.

Example:

     A  B 0  foo  0 1  bar  1 2  foo  2 3  foo  3 

The following doesn't seem to work:

grouped = df.groupby('A') grouped[grouped.size > 1] 

Expected Result:

A foo 0     2     3 
like image 710
Abhi Avatar asked Oct 31 '12 21:10

Abhi


People also ask

How do you sort a grouped DataFrame in Python?

To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.

What is the difference between aggregating transforming and filtering data?

If you want to get a single value for each group, use aggregate() (or one of its shortcuts). If you want to get a subset of the original rows, use filter() .

How do you filter categorical data in pandas?

For categorical data you can use Pandas string functions to filter the data. The startswith() function returns rows where a given column contains values that start with a certain value, and endswith() which returns rows with values that end with a certain value.

How do you sort a group by a DataFrame?

Sort within Groups of groupby() Result in DataFrameBy using DataFrame. sort_values() , you can sort DataFrame in ascending or descending order, before you use this first group the DataFrame rows by using DataFrame. groupby() method. Note that groupby preserves the order of rows within each group.


2 Answers

As of pandas 0.12 you can do:

>>> grouped.filter(lambda x: len(x) > 1)       A  B 0  foo  0 2  foo  2 3  foo  3 
like image 168
elyase Avatar answered Sep 22 '22 08:09

elyase


I have found transform to be much more efficient than filter for very large dataframes:

element_group_sizes = df['A'].groupby(df['A']).transform('size') df[element_group_sizes>1] 

Or, in one line:

df[df['A'].groupby(df['A']).transform('size')>1] 
like image 26
Sealander Avatar answered Sep 20 '22 08:09

Sealander