Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas cannot compute isin with a duplicate axis

My dataframe is something like this:

             userid           codeassigned         timestamp
15           553938              M1           1499371200000
15390        527638              M2           1599731200000
15389        521638              M2           1399901200000
15388        521638              M3           1439841200000
15387        553938              M4           1499521200000

I have taken a subset of this dataframe (user with latest timestamp) by doing:

df = df.sort_values('timestamp', ascending=False)
mask = df.duplicated('userid')
subset_df = df[~mask]

Now, I want all the rows from main dataframe where (userid, timestamp) are in subset_df (there can be multiple rows with same[userid, timestamp] but with different code assigned); for which I'm doing:

subset_df[['userid', 'timestamp']].isin(df)

However, I'm getting this error:

ValueError: cannot compute isin with a duplicate axis.

Any idea what I'm doing wrong ?

like image 625
Saurabh Verma Avatar asked Oct 28 '25 18:10

Saurabh Verma


1 Answers

You need merge for inner join with filtered subset:

subset_df = df.loc[~mask, ['userid', 'timestamp']]

df = subset_df.merge(df)

Or:

df = subset_df[['userid', 'timestamp']].merge(df)
like image 86
jezrael Avatar answered Oct 30 '25 15:10

jezrael