df1 = {
'vouchers': [100, 200, 300, 400],
'units': [11, 12, 12, 13],
'some_other_data': ['a', 'b', 'c', 'd'],
}
df2 = {
'vouchers': [500, 200, 600, 300],
'units': [11, 12, 12, 13],
'some_other_data': ['b', 'd', 'c', 'a'],
}
Given the two dataframes like above, I want to do the following: if voucher from df1 can be found in df2, and their corresponding unit is the same, then delete the entire voucher row from df1.
So in this case the desired output would be:
df1 = {
'vouchers': [100, 300, 400],
'units': [11, 12, 13],
'some_other_data': ['a', 'c', 'd'],
}
What would be the best way to achieve this?
You can do this efficiently with index operations, using pd.Index.isin:
u = df1.set_index(['vouchers', 'units'])
df1[~u.index.isin(pd.MultiIndex.from_arrays([df2.vouchers, df2.units]))]
vouchers units some_other_data
0 100 11 a
2 300 12 c
3 400 13 d
Doing with merge indicator , after we get the index need to remove , using drop
idx=df1.merge(df2,on=['vouchers','units'],indicator=True,how='left').\
loc[lambda x : x['_merge']=='both'].index
df1=df1.drop(idx,axis=0)
df1
Out[374]:
vouchers units some_other_data
0 100 11 a
2 300 12 c
3 400 13 d
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With