My question is simple, I have a dataframe and I groupby
the results based on a column and get the size like this:
df.groupby('column').size()
Now the problem is that I only want the ones where size is greater than X. I am wondering if I can do it using a lambda function or anything similar? I have already tried this:
df.groupby('column').size() > X
and it prints out some True and False values.
This particular syntax groups the rows of the DataFrame based on var1 and then counts the number of rows where var2 is equal to 'val. ' The following example shows how to use this syntax in practice.
The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation.
Try this code:
df.groupby('column').filter(lambda group: group.size > X)
The grouped result is a regular DataFrame, so just filter the results as usual:
import pandas as pd df = pd.DataFrame({'a': ['a', 'b', 'a', 'a', 'b', 'c', 'd']}) after = df.groupby('a').size() >> after a a 3 b 2 c 1 d 1 dtype: int64 >> after[after > 2] a a 3 dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With