Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count occurrences of item in one dataframe in another

I am currently running into a problem and hoping that someone could assist. Currently have 2 dataframes of items which are hundreds of thousands of lines long. (one has over 200k and one over 180k). the larger of the 2 dataframes is going to contain unique values of users, while the smaller one does not so for example:

df1:
user1
user2
user3
user4
user5

df2:
user1
user1
user5
user4
user5
user5

What i need to do is take each user from df1 and efficiently see if it is in df2 and how many times it occurs.

Thanks!

like image 616
Sam L Avatar asked Dec 11 '22 06:12

Sam L


2 Answers

Using value_counts

df1['Newcount']=df1['df1:'].map(df2['df2:'].value_counts())
df1
Out[117]: 
    df1:  Newcount
0  user1       2.0
1  user2       NaN
2  user3       NaN
3  user4       1.0
4  user5       3.0
like image 106
BENY Avatar answered Feb 11 '23 04:02

BENY


Assuming the relevant column in each DataFrame is called 'user', you can use

pd.merge(
    df1,
    df2.user.groupby(df2.user).count(),
    left_on='user',
    right_index=True,
    how='left')

Explanation:

  • The groupby + count will find the number of occurrences of each user. It will create a DataFrame whose index is the user, and the value is the count.

  • The merge left-merges the resulting DataFrame onto df1.

like image 23
Ami Tavory Avatar answered Feb 11 '23 05:02

Ami Tavory