My question is simple, I have a dataframe and I <code>groupby</code> the results based on a column and get the size like this: <pre class="prettyprint"><code>df.groupby('column').size() </code></pre> Now the problem is that I only want the ones where size is greater than X. I am wondering if I can do it using a lambda function or anything similar? I have already tried this: <pre class="prettyprint"><code>df.groupby('column').size() > X </code></pre> and it prints out some True and False values.

Try this code: <pre class="prettyprint"><code>df.groupby('column').filter(lambda group: group.size > X) </code></pre>

The grouped result is a regular DataFrame, so just filter the results as usual: <pre class="prettyprint"><code> import pandas as pd df = pd.DataFrame({'a': ['a', 'b', 'a', 'a', 'b', 'c', 'd']}) after = df.groupby('a').size() >> after a a 3 b 2 c 1 d 1 dtype: int64 >> after[after > 2] a a 3 dtype: int64 </code></pre>

Python pandas dataframe group by based on a condition

Tags:

My question is simple, I have a dataframe and I groupby the results based on a column and get the size like this:

df.groupby('column').size()

Now the problem is that I only want the ones where size is greater than X. I am wondering if I can do it using a lambda function or anything similar? I have already tried this:

df.groupby('column').size() > X

and it prints out some True and False values.

753

asked Jul 08 '15 20:07

ahajib

2 Answers

Try this code:

df.groupby('column').filter(lambda group: group.size > X)

117

answered Sep 21 '22 15:09

Jianxun Li

The grouped result is a regular DataFrame, so just filter the results as usual:

 import pandas as pd   df = pd.DataFrame({'a': ['a', 'b', 'a', 'a', 'b', 'c', 'd']})  after = df.groupby('a').size()  >> after  a  a    3  b    2  c    1  d    1  dtype: int64   >> after[after > 2]  a  a    3  dtype: int64

answered Sep 22 '22 15:09

Ami Tavory

Related questions
                            
                                How to intentionally cause a 400 Bad Request in Python/Flask?
                            
                                Frame "will be different at run time".... Isn't that the whole point?
                            
                                Timespec redefinition error [duplicate]
                            
                                R Extract Hours from Time in factor Format
                            
                                Angular2, disable button if no checkbox selected
                            
                                Should I use printf("\n") or putchar('\n') to print a newline in C? [closed]
                            
                                Convert a 2d matrix to a 3d one hot matrix numpy
                            
                                leaflet.js map is not showing up
                            
                                Change color of underline input and label in Materialize.css framework
                            
                                Node JS auto restart all forever JS process when server goes down / crashes
                            
                                Adding units to heatmap annotation in Seaborn
                            
                                Go http.Get, concurrency, and "Connection reset by peer"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With