I'm having a hard time filtering the groupby
items in pandas. I want to do
select email, count(1) as cnt
from customers
group by email
having count(email) > 1
order by cnt desc
I did
customers.groupby('Email')['CustomerID'].size()
and it gives me the list of emails and their respective counts correctly but I am not able to achieve the having count(email) > 1
part.
email_cnt[email_cnt.size > 1]
returns 1
email_cnt = customers.groupby('Email')
email_dup = email_cnt.filter(lambda x:len(x) > 2)
gives the whole record of customers with email > 1
but I want the aggregate table.
Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
Pandas DataFrame count() Method The count() method counts the number of not empty values for each row, or column if you specify the axis parameter as axis='columns' , and returns a Series object with the result for each row (or column).
Use Sum Function to Count Specific Values in a Column in a Dataframe. We can use the sum() function on a specified column to count values equal to a set condition, in this case we use == to get just rows equal to our specific data point.
Instead of writing email_cnt[email_cnt.size > 1]
, just write email_cnt[email_cnt > 1]
(there's no need to call.size
again). This uses the Boolean series email_cnt > 1
to return only the relevant values of email_cnt
.
For example:
>>> customers = pd.DataFrame({'Email':['foo','bar','foo','foo','baz','bar'],
'CustomerID':[1,2,1,2,1,1]})
>>> email_cnt = customers.groupby('Email')['CustomerID'].size()
>>> email_cnt[email_cnt > 1]
Email
bar 2
foo 3
dtype: int64
Two other solutions (with modern "method chain" approach):
Using selection by callable:
customers.groupby('Email').size().loc[lambda x: x>1].sort_values()
Using the query method:
(customers.groupby('Email')['CustomerID'].
agg([len]).query('len > 1').sort_values('len'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With