I'm trying to separate a DataFrame into groups and drop groups below a minimum size (small outliers).
Here's what I've tried:
df.groupby(['A']).filter(lambda x: x.count() > min_size)
df.groupby(['A']).filter(lambda x: x.size() > min_size)
df.groupby(['A']).filter(lambda x: x['A'].count() > min_size)
df.groupby(['A']).filter(lambda x: x['A'].size() > min_size)
But these either throw an exception or return a different table than I'm expecting. I'd just like to filter, not compute a new table.
To group Pandas dataframe, we use groupby(). To sort grouped dataframe in descending order, use sort_values(). The size() method is used to get the dataframe size.
Pandas dataframe has groupby([column(s)]). first() method which is used to get the first record from each group.
Groupby preserves the order of rows within each group.
You can use len
:
In [11]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
In [12]: df.groupby('A').filter(lambda x: len(x) > 1)
Out[12]:
A B
0 1 2
1 1 4
The number of rows is in the attribute .shape[0]
:
df.groupby('A').filter(lambda x: x.shape[0] >= min_size)
NB: If you want to remove the groups below the minimum size, keep those that are above or at the minimum size (>=
, not >
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With