I have two separate pandas dataframes (df1
and df2
) which have multiple columns, but only one in common ('text').
I would like to do find every row in df2
that does not have a match in any of the rows of the column that df2
and df1
have in common.
df1
A B text
45 2 score
33 5 miss
20 1 score
df2
C D text
.5 2 shot
.3 2 shot
.3 1 miss
Result df (remove row containing miss since it occurs in df1)
C D text
.5 2 shot
.3 2 shot
Is it possible to use the isin
method in this scenario?
To find the positions of two matching columns, we first initialize a pandas dataframe with two columns of city names. Then we use where() of numpy to compare the values of two columns. This returns an array that represents the indices where the two columns have the same value.
As you asked, you can do this efficiently using isin
(without resorting to expensive merge
s).
>>> df2[~df2.text.isin(df1.text.values)]
C D text
0 0.5 2 shot
1 0.3 2 shot
You can merge them and keep only the lines that have a NaN.
df2[pd.merge(df1, df2, how='outer').isnull().any(axis=1)]
or you can use isin
:
df2[~df2.text.isin(df1.text)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With