I have two pandas df x and y, both with the same 3 columns A B C (not nullable). I need to create a new df z, obtained by "subtracting from x the rows which are entirely identical to the rows of y", i.e. a
x left join y on x.A=y.A and x.B=y.B and x.C=y.C
where y.A is null
How would I do that? Got stuck with indexes, concat, merge, join, ...
Example:
dataframe x
A B C
q1 q2 q3
q4 q2 q3
q7 q2 q9
dataframe y
A B C
q4 q2 q3
dataframe z
A B C
q1 q2 q3
q7 q2 q9
A left join, or left merge, keeps every row from the left dataframe. Result from left-join or left-merge of two dataframes in Pandas. Rows in the left dataframe that have no corresponding join value in the right dataframe are left with NaN values.
To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.
left: use only keys from left frame, similar to a SQL left outer join; preserve key order. right: use only keys from right frame, similar to a SQL right outer join; preserve key order. outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
I think need merge
with indicator and filter only rows from left
DataFrame
:
df = x.merge(y, indicator='i', how='outer').query('i == "left_only"').drop('i', 1)
print (df)
A B C
0 q1 q2 q3
2 q7 q2 q93
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With