Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas dataframe group by based on a condition

Tags:

My question is simple, I have a dataframe and I groupby the results based on a column and get the size like this:

df.groupby('column').size() 

Now the problem is that I only want the ones where size is greater than X. I am wondering if I can do it using a lambda function or anything similar? I have already tried this:

df.groupby('column').size() > X 

and it prints out some True and False values.

like image 753
ahajib Avatar asked Jul 08 '15 20:07

ahajib


People also ask

How do you use Groupby with condition?

This particular syntax groups the rows of the DataFrame based on var1 and then counts the number of rows where var2 is equal to 'val. ' The following example shows how to use this syntax in practice.

How do I group by in pandas?

The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation.


2 Answers

Try this code:

df.groupby('column').filter(lambda group: group.size > X) 
like image 117
Jianxun Li Avatar answered Sep 21 '22 15:09

Jianxun Li


The grouped result is a regular DataFrame, so just filter the results as usual:

 import pandas as pd   df = pd.DataFrame({'a': ['a', 'b', 'a', 'a', 'b', 'c', 'd']})  after = df.groupby('a').size()  >> after  a  a    3  b    2  c    1  d    1  dtype: int64   >> after[after > 2]  a  a    3  dtype: int64 
like image 33
Ami Tavory Avatar answered Sep 22 '22 15:09

Ami Tavory