Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby then drop groups below specified size

Tags:

python

pandas

I'm trying to separate a DataFrame into groups and drop groups below a minimum size (small outliers).

Here's what I've tried:

df.groupby(['A']).filter(lambda x: x.count() > min_size)
df.groupby(['A']).filter(lambda x: x.size() > min_size)
df.groupby(['A']).filter(lambda x: x['A'].count() > min_size)
df.groupby(['A']).filter(lambda x: x['A'].size() > min_size)

But these either throw an exception or return a different table than I'm expecting. I'd just like to filter, not compute a new table.

like image 934
Caleb Jares Avatar asked Feb 08 '19 00:02

Caleb Jares


People also ask

How do you get Groupby descending in pandas?

To group Pandas dataframe, we use groupby(). To sort grouped dataframe in descending order, use sort_values(). The size() method is used to get the dataframe size.

What does .first do in Groupby?

Pandas dataframe has groupby([column(s)]). first() method which is used to get the first record from each group.

Does Groupby preserve order?

Groupby preserves the order of rows within each group.


2 Answers

You can use len:

In [11]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])

In [12]: df.groupby('A').filter(lambda x: len(x) > 1)
Out[12]:
   A  B
0  1  2
1  1  4
like image 130
Andy Hayden Avatar answered Nov 08 '22 20:11

Andy Hayden


The number of rows is in the attribute .shape[0]:

df.groupby('A').filter(lambda x: x.shape[0] >= min_size)

NB: If you want to remove the groups below the minimum size, keep those that are above or at the minimum size (>=, not >).

like image 31
DYZ Avatar answered Nov 08 '22 20:11

DYZ