sql select group by a having count(1) > 1 equivalent in python pandas?

Tags:

I'm having a hard time filtering the groupby items in pandas. I want to do

select email, count(1) as cnt 
from customers 
group by email 
having count(email) > 1 
order by cnt desc

I did

Click to copy

customers.groupby('Email')['CustomerID'].size()

and it gives me the list of emails and their respective counts correctly but I am not able to achieve the having count(email) > 1 part.

Click to copy

email_cnt[email_cnt.size > 1]

returns 1

Click to copy

email_cnt = customers.groupby('Email')
email_dup = email_cnt.filter(lambda x:len(x) > 2)

gives the whole record of customers with email > 1 but I want the aggregate table.

608

asked Dec 31 '14 08:12

tangkk

2 Answers

Instead of writing email_cnt[email_cnt.size > 1], just write email_cnt[email_cnt > 1] (there's no need to call.size again). This uses the Boolean series email_cnt > 1 to return only the relevant values of email_cnt.

For example:

Click to copy

>>> customers = pd.DataFrame({'Email':['foo','bar','foo','foo','baz','bar'],
                              'CustomerID':[1,2,1,2,1,1]})
>>> email_cnt = customers.groupby('Email')['CustomerID'].size()
>>> email_cnt[email_cnt > 1]
Email
bar      2
foo      3
dtype: int64

168

answered Sep 18 '22 18:09

Alex Riley

Two other solutions (with modern "method chain" approach):

Using selection by callable:

Click to copy

customers.groupby('Email').size().loc[lambda x: x>1].sort_values()

Using the query method:

Click to copy

(customers.groupby('Email')['CustomerID'].
    agg([len]).query('len > 1').sort_values('len'))

answered Sep 18 '22 18:09

Ilya V. Schurov

Related questions
                            
                                How to generate an AccessToken programmatically in Django?
                            
                                Installing functools gives me AttributeError 'module' object has no attribute 'compose'
                            
                                How to scrape Instagram with BeautifulSoup
                            
                                Numpy longdouble arithmetic does not seem to be in long double with conversion
                            
                                Does argument unpacking use iteration or item-getting?
                            
                                How to preserve inline CSS style with lxml.html.clean.Cleaner() in Python?
                            
                                Is there multiple cursor-like functionality in PyCharm?
                            
                                Cubic spline memory error
                            
                                How to do linear regression, taking errorbars into account?
                            
                                two's complement of numbers in python
                            
                                Python lxml.etree - Is it more effective to parse XML from string or directly from link?
                            
                                Set value multiindex Pandas
                            
                                Ignore a column while building a model with SKLearn
                            
                                Django serving media files (user uploaded files ) in openshift
                            
                                Why doesn't pytz localize() produce a datetime object with tzinfo matching the tz object that localized it?
                            
                                How to disable a Combobox in Tkinter?
                            
                                One-to-many Flask | SQLAlchemy
                            
                                Python: fastest way to write pandas DataFrame to Excel on multiple sheets
                            
                                Why `print` content doesn't show immediately in terminal? [duplicate]
                            
                                How to really test signal handling in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

sql select group by a having count(1) > 1 equivalent in python pandas?

Tags:

python

sql

pandas

dataframe

tangkk

People also ask

2 Answers

Alex Riley

Ilya V. Schurov

Recent Activity

Donate For Us