I have the following data:
userid itemid
1 1
1 1
1 3
1 4
2 1
2 2
2 3
I want to drop userIDs who has viewed the same itemID more than or equal to twice. For example, userid=1 has viewed itemid=1 twice, and thus I want to drop the entire record of userid=1. However, since userid=2 hasn't viewed the same item twice, I will leave userid=2 as it is.
So I want my data to be like the following:
userid itemid
2 1
2 2
2 3
Can someone help me?
import pandas as pd
df = pd.DataFrame({'userid':[1,1,1,1, 2,2,2],
'itemid':[1,1,3,4, 1,2,3] })
You can use duplicated
to determine the row level duplicates, then perform a groupby
on 'userid' to determine 'userid' level duplicates, then drop accordingly.
To drop without a threshold:
df = df[~df.duplicated(['userid', 'itemid']).groupby(df['userid']).transform('any')]
To drop with a threshold, use keep=False
in duplicated
, and sum over the Boolean column and compare against your threshold. For example, with a threshold of 3:
df = df[~df.duplicated(['userid', 'itemid'], keep=False).groupby(df['userid']).transform('sum').ge(3)]
The resulting output for no threshold:
userid itemid
4 2 1
5 2 2
6 2 3
filter
Was made for this. You can pass a function that returns a boolean that determines if the group passed the filter or not.
filter
and value_counts
Most generalizable and intuitive
df.groupby('userid').filter(lambda x: x.itemid.value_counts().max() < 2)
filter
and is_unique
special case when looking for n < 2
df.groupby('userid').filter(lambda x: x.itemid.is_unique)
userid itemid
4 2 1
5 2 2
6 2 3
Group the dataframe by users and items:
views = df.groupby(['userid','itemid'])['itemid'].count()
#userid itemid
#1 1 2 <=== The offending row
# 3 1
# 4 1
#2 1 1
# 2 1
# 3 1
#Name: dummy, dtype: int64
Find out who saw any item only once:
THRESHOLD = 2
viewed = ~(views.unstack() >= THRESHOLD).any(axis=1)
#userid
#1 False
#2 True
#dtype: bool
Combine the results and keep the 'good' rows:
combined = df.merge(pd.DataFrame(viewed).reset_index())
combined[combined[0]][['userid','itemid']]
# userid itemid
#4 2 1
#5 2 2
#6 2 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With