Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete a row from a dataframe if its column values are found in another

df1 = {
    'vouchers': [100, 200, 300, 400],
    'units': [11, 12, 12, 13],
    'some_other_data': ['a', 'b', 'c', 'd'],
    }
df2 = {
    'vouchers': [500, 200, 600, 300],
    'units': [11, 12, 12, 13],
    'some_other_data': ['b', 'd', 'c', 'a'],
    }

Given the two dataframes like above, I want to do the following: if voucher from df1 can be found in df2, and their corresponding unit is the same, then delete the entire voucher row from df1.

So in this case the desired output would be:

df1 = {
    'vouchers': [100, 300, 400],
    'units': [11, 12, 13],
    'some_other_data': ['a', 'c', 'd'],
    }

What would be the best way to achieve this?

like image 818
barciewicz Avatar asked Dec 18 '22 19:12

barciewicz


2 Answers

You can do this efficiently with index operations, using pd.Index.isin:

u = df1.set_index(['vouchers', 'units'])
df1[~u.index.isin(pd.MultiIndex.from_arrays([df2.vouchers, df2.units]))]

   vouchers  units some_other_data
0       100     11               a
2       300     12               c
3       400     13               d
like image 127
cs95 Avatar answered Apr 27 '23 05:04

cs95


Doing with merge indicator , after we get the index need to remove , using drop

idx=df1.merge(df2,on=['vouchers','units'],indicator=True,how='left').\
     loc[lambda x : x['_merge']=='both'].index
df1=df1.drop(idx,axis=0)
df1
Out[374]: 
   vouchers  units some_other_data
0       100     11               a
2       300     12               c
3       400     13               d
like image 37
BENY Avatar answered Apr 27 '23 04:04

BENY