I have a dataframe like this:
    ID1    ID2
0   foo    bar
1   fizz   buzz
And another like this:
    ID1    ID2    Count    Code   
0   abc    def      1        A
1   fizz   buzz     5        A
2   fizz1  buzz2    3        C
3   foo    bar      6        Z
4   foo    bar      6        Z
What I would like to do is filter the second dataframe where ID1 and ID2 match a row in the first dataframe, and whenever there's a match I want to remove that row from the first dataframe to avoid duplicates. This would yield a dataframe that looks like this:
    ID1    ID2    Count    Code   
1   fizz   buzz     5        A
3   foo    bar      6        Z
I know I can do this by nesting for loops, stepping through all the rows, and manually removing a row from the first frame whenever I get a match but I am wondering if there is a more pythonic way to do this.  I am not experienced in pandas so there may be a much cleaner way to do that I do not know about.  I was previously using .isin() but had to scrap it.  Each ID pair can exist in the dataframe up to N times and I need the filtered frame to contain between 0 and N instances of a pair.   
Try this:
df2.merge(df1[['ID1','ID2']])
                        Use merge with drop_duplicates, if only same columns for join in both df:
df = pd.merge(df1,df2.drop_duplicates())
print (df)
    ID1   ID2  Count Code
0   foo   bar      6    Z
1  fizz  buzz      5    A
If need check dupes only in ID columns:
df = pd.merge(df1,df2.drop_duplicates(subset=['ID1','ID2']))
print (df)
    ID1   ID2  Count Code
0   foo   bar      6    Z
1  fizz  buzz      5    A
If more columns are overlaping add parameter on:
df = pd.merge(df1, df2.drop_duplicates(), on=['ID1','ID2'])
If not remove dupe rows:
df = pd.merge(df1,df2)
print (df)
    ID1   ID2  Count Code
0   foo   bar      6    Z
1   foo   bar      6    Z
2  fizz  buzz      5    A
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With