Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

all rows in df1 that are NOT in df2

Tags:

python

pandas

I have a df (df1) that looks like:

df1 = pd.DataFrame([
        ['YYZ', 'SFO', 1],
        ['YYZ', 'YYD', 1],
        ['YYZ', 'EWR', 1],
        ['YYZ', 'DFW', 1],
        ['YYZ', 'LAX', 1],
        ['YYZ', 'YYC', 1]
    ], columns=['city1', 'city2', 'val'])

I have another df (df2) that is a subset of df1:

df2 = pd.DataFrame([
        ['YYZ', 'SFO', 1],
        ['YYZ', 'YYD', 1]
    ], columns=['city1', 'city2', 'val'])

I want all rows in df1 that are NOT in df2.

I've tried various options described in this post conditional slicing based on values of 2 columns, however I haven't been able to get it to work.

Your help would be appreciated.

like image 879
codingknob Avatar asked Dec 23 '22 21:12

codingknob


1 Answers

  • Use merge with indicator=True
  • Then use query to strip out only those with 'left_only'

df1.merge(
    df2, how='outer', indicator=True
).query('_merge == "left_only"').drop('_merge', 1)

  city1 city2  val
2   YYZ   EWR    1
3   YYZ   DFW    1
4   YYZ   LAX    1
5   YYZ   YYC    1
like image 68
piRSquared Avatar answered Dec 26 '22 11:12

piRSquared