I have a dataframe like this:
ID1 ID2
0 foo bar
1 fizz buzz
And another like this:
ID1 ID2 Count Code
0 abc def 1 A
1 fizz buzz 5 A
2 fizz1 buzz2 3 C
3 foo bar 6 Z
4 foo bar 6 Z
What I would like to do is filter the second dataframe where ID1 and ID2 match a row in the first dataframe, and whenever there's a match I want to remove that row from the first dataframe to avoid duplicates. This would yield a dataframe that looks like this:
ID1 ID2 Count Code
1 fizz buzz 5 A
3 foo bar 6 Z
I know I can do this by nesting for loops, stepping through all the rows, and manually removing a row from the first frame whenever I get a match but I am wondering if there is a more pythonic way to do this. I am not experienced in pandas so there may be a much cleaner way to do that I do not know about. I was previously using .isin()
but had to scrap it. Each ID pair can exist in the dataframe up to N times and I need the filtered frame to contain between 0 and N instances of a pair.
Try this:
df2.merge(df1[['ID1','ID2']])
Use merge
with drop_duplicates
, if only same columns for join in both df
:
df = pd.merge(df1,df2.drop_duplicates())
print (df)
ID1 ID2 Count Code
0 foo bar 6 Z
1 fizz buzz 5 A
If need check dupes only in ID
columns:
df = pd.merge(df1,df2.drop_duplicates(subset=['ID1','ID2']))
print (df)
ID1 ID2 Count Code
0 foo bar 6 Z
1 fizz buzz 5 A
If more columns are overlaping add parameter on
:
df = pd.merge(df1, df2.drop_duplicates(), on=['ID1','ID2'])
If not remove dupe rows:
df = pd.merge(df1,df2)
print (df)
ID1 ID2 Count Code
0 foo bar 6 Z
1 foo bar 6 Z
2 fizz buzz 5 A
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With