Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Diff of two Dataframes

I need to compare two dataframes of different size row-wise and print out non matching rows. Lets take the following two:

df1 = DataFrame({ 'Buyer': ['Carl', 'Carl', 'Carl'], 'Quantity': [18, 3, 5, ]})  df2 = DataFrame({ 'Buyer': ['Carl', 'Mark', 'Carl', 'Carl'], 'Quantity': [2, 1, 18, 5]}) 

What is the most efficient way to row-wise over df2 and print out rows not in df1 e.g.

Buyer     Quantity  Carl         2 Mark         1 

Important: I do not want to have row:

Buyer     Quantity  Carl         3 

Included in the diff:

I have already tried: Comparing two dataframes of different length row by row and adding columns for each row with equal value and Compare two DataFrames and output their differences side-by-side

But these do not match with my problem.

like image 429
Andy Avatar asked Apr 27 '16 13:04

Andy


People also ask

How do you differentiate two data frames?

The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.


1 Answers

merge the 2 dfs using method 'outer' and pass param indicator=True this will tell you whether the rows are present in both/left only/right only, you can then filter the merged df after:

In [22]: merged = df1.merge(df2, indicator=True, how='outer') merged[merged['_merge'] == 'right_only']  Out[22]:   Buyer  Quantity      _merge 3  Carl         2  right_only 4  Mark         1  right_only 
like image 164
EdChum Avatar answered Oct 01 '22 02:10

EdChum