I have two dataframes, dfA and dfB of identical columns. I'm looking to get only the records from dataframe dfB that are not present in dataframe dfA.
to be clear, I do not want to get the records in dfA that are not in dfB.
I managed to hack something together that works using this, but the code is not easy to understand and by extension not very pythonic.
I'm looking for a more elegant solution, perhaps using pandas join/merge/append but wasn't able to make it work.
Example of what I want:
dfA:
Date Category Price
1 2013-11-24 Coat 22.1
2 2013-11-24 Shirt 8.7
3 2013-11-01 Socks 9 <<< Only present in this df
dfB:
Date Category Price
1 2013-11-24 Coat 22.1
2 2013-11-24 Shirt 8.7
3 2013-11-24 Helmet 2.1 <<< Only present in this df
4 2013-11-24 Pants 10.7 <<< Only present in this df
Result:
Date Category Price
1 2013-11-24 Helmet 2.1
2 2013-11-24 Pants 10.7
One of the idiomatic approaches is to use merge(..., how='outer', indicator=True)
and to filter resulting DF by generated column _merge
:
In [18]: (A.merge(B, how='outer', indicator=True)
.query("_merge == 'right_only'")
.drop('_merge',1))
Out[18]:
Date Category Price
3 2013-11-24 Helmet 2.1
4 2013-11-24 Pants 10.7
Use merge
with outer join and filter by indicator column with boolean indexing
:
df = pd.merge(dfA, dfB, indicator=True, how='outer')
print (df)
Date Category Price _merge
0 2013-11-24 Coat 22.1 both
1 2013-11-24 Shirt 8.7 both
2 2013-11-01 Socks 9.0 left_only
3 2013-11-24 Helmet 2.1 right_only
4 2013-11-24 Pants 10.7 right_only
df1 = df[df['_merge'] == 'right_only'].drop('_merge', axis=1)
print (df1)
Date Category Price
3 2013-11-24 Helmet 2.1
4 2013-11-24 Pants 10.7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With