Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Joining pandas dataframes of identical columns and getting ONLY the items in table B not present in A

I have two dataframes, dfA and dfB of identical columns. I'm looking to get only the records from dataframe dfB that are not present in dataframe dfA.

to be clear, I do not want to get the records in dfA that are not in dfB.

I managed to hack something together that works using this, but the code is not easy to understand and by extension not very pythonic.

I'm looking for a more elegant solution, perhaps using pandas join/merge/append but wasn't able to make it work.

Example of what I want:

  dfA: 
  Date       Category  Price 
1 2013-11-24 Coat      22.1
2 2013-11-24 Shirt     8.7
3 2013-11-01 Socks     9     <<< Only present in this df

  dfB:
  Date       Category  Price 
1 2013-11-24 Coat      22.1
2 2013-11-24 Shirt     8.7
3 2013-11-24 Helmet    2.1   <<< Only present in this df
4 2013-11-24 Pants     10.7  <<< Only present in this df

  Result:
  Date       Category  Price 
1 2013-11-24 Helmet    2.1
2 2013-11-24 Pants     10.7
like image 816
Idodo Avatar asked Jan 03 '23 10:01

Idodo


2 Answers

One of the idiomatic approaches is to use merge(..., how='outer', indicator=True) and to filter resulting DF by generated column _merge:

In [18]: (A.merge(B, how='outer', indicator=True)
           .query("_merge == 'right_only'")
           .drop('_merge',1))
Out[18]:
         Date Category  Price
3  2013-11-24   Helmet    2.1
4  2013-11-24    Pants   10.7
like image 71
MaxU - stop WAR against UA Avatar answered Jan 05 '23 17:01

MaxU - stop WAR against UA


Use merge with outer join and filter by indicator column with boolean indexing:

df = pd.merge(dfA, dfB, indicator=True, how='outer')
print (df)
         Date Category  Price      _merge
0  2013-11-24     Coat   22.1        both
1  2013-11-24    Shirt    8.7        both
2  2013-11-01    Socks    9.0   left_only
3  2013-11-24   Helmet    2.1  right_only
4  2013-11-24    Pants   10.7  right_only

df1 = df[df['_merge'] == 'right_only'].drop('_merge', axis=1)
print (df1)
         Date Category  Price
3  2013-11-24   Helmet    2.1
4  2013-11-24    Pants   10.7
like image 43
jezrael Avatar answered Jan 05 '23 16:01

jezrael