Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Count Distinct User IDs that share the same email - Pandas Data Manipulation

Tags:

python

pandas

I want to return a dataframe that only shows rows where a User_ID has more than 1 Email associated to it. In other words, I am trying to count how many distinct User Ids there are that share an email - See below

Sample Data

   Unnamed: 0    First Name  ... User_ID                      Email
0           0           Bob  ...             2011              Bob@email
1           1          Dirk  ...             2012              jack@email
2           2         Sarah  ...             2013              Sara@email
3           3           max  ...             2015              Bob@email
4           4           leo  ...             2016              Sara@email

From the table above, my desired outcome would be something like this (note I would drop Value Counts less than 0 as I am only interested in User IDs that have

Output

User_ID   (Count of other User_Ids with same Domain) 
2011       1 
2012       0 
2013       1      
2015       1
2016       1

In SQL, this would work something like below where I would get output of all user IDs having greater than a count of 1 distinct associated emails. Can someone advise how i can do sonmething similar in python?

SELECT User_ID, COUNT(EMAILS) AS Count
FROM dataframe
HAVING Count > 1

In python I tried to do the following leveraging the value_counts function but dont know how to make it output the desired output above

df = pd.read_csv("data.csv")
#print( df['Email'].value_counts() > 1)
emailList = list(df["Email"].value_counts())
 
duplicates = df[df['Email'].duplicated(keep=False)]
print(duplicates.value_counts())
like image 572
Blackdynomite Avatar asked Mar 08 '26 22:03

Blackdynomite


1 Answers

Are you after

df.groupby('Email')['FirstName'].value_counts()

and if you wanted to filter emails with more than 1 name. Please Try

df[df.groupby('Email')['FirstName'].transform(lambda x: x.count().sum()).gt(1)]

or

 df.groupby('Email')['FirstName'].agg(list).to_frame('names')



                 names
Email                   
Bob@email     [Bob, max]
Sara@email  [Sarah, leo]
jack@email        [Dirk]
like image 134
wwnde Avatar answered Mar 10 '26 12:03

wwnde



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!