I am currently running into a problem and hoping that someone could assist. Currently have 2 dataframes of items which are hundreds of thousands of lines long. (one has over 200k and one over 180k). the larger of the 2 dataframes is going to contain unique values of users, while the smaller one does not so for example:
df1:
user1
user2
user3
user4
user5
df2:
user1
user1
user5
user4
user5
user5
What i need to do is take each user from df1 and efficiently see if it is in df2 and how many times it occurs.
Thanks!
Using value_counts
df1['Newcount']=df1['df1:'].map(df2['df2:'].value_counts())
df1
Out[117]:
df1: Newcount
0 user1 2.0
1 user2 NaN
2 user3 NaN
3 user4 1.0
4 user5 3.0
Assuming the relevant column in each DataFrame is called 'user'
, you can use
pd.merge(
df1,
df2.user.groupby(df2.user).count(),
left_on='user',
right_index=True,
how='left')
Explanation:
The groupby
+ count
will find the number of occurrences of each user. It will create a DataFrame whose index is the user, and the value is the count.
The merge left-merges the resulting DataFrame onto df1
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With