I need to compare two dataframes of different size row-wise and print out non matching rows. Lets take the following two:
df1 = DataFrame({ 'Buyer': ['Carl', 'Carl', 'Carl'], 'Quantity': [18, 3, 5, ]}) df2 = DataFrame({ 'Buyer': ['Carl', 'Mark', 'Carl', 'Carl'], 'Quantity': [2, 1, 18, 5]})
What is the most efficient way to row-wise over df2 and print out rows not in df1 e.g.
Buyer Quantity Carl 2 Mark 1
Important: I do not want to have row:
Buyer Quantity Carl 3
Included in the diff:
I have already tried: Comparing two dataframes of different length row by row and adding columns for each row with equal value and Compare two DataFrames and output their differences side-by-side
But these do not match with my problem.
The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
merge
the 2 dfs using method 'outer' and pass param indicator=True
this will tell you whether the rows are present in both/left only/right only, you can then filter the merged df after:
In [22]: merged = df1.merge(df2, indicator=True, how='outer') merged[merged['_merge'] == 'right_only'] Out[22]: Buyer Quantity _merge 3 Carl 2 right_only 4 Mark 1 right_only
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With