Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to obtain the symmetric difference between two DataFrames?

In the SparkSQL 1.6 API (scala) Dataframe has functions for intersect and except, but not one for difference. Obviously, a combination of union and except can be used to generate difference:

df1.except(df2).union(df2.except(df1)) 

But this seems a bit awkward. In my experience, if something seems awkward, there's a better way to do it, especially in Scala.

like image 802
WillD Avatar asked Mar 24 '16 12:03

WillD


People also ask

How do you compare two DataFrames and get the difference?

The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.

How do you subtract one Dataframe from another?

subtract() function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe – other but with a support to substitute for missing data in one of the inputs.

How do you find the common row between two data frames?

To find the common rows between two DataFrames with merge(), use the parameter “how” as “inner” since it works like SQL Inner Join and this is what we want to achieve.

How can I find the difference between two dates in pandas?

Use df. dates1-df. dates2 to find the difference between the two dates and then convert the result in the form of months.


2 Answers

You can always rewrite it as:

df1.unionAll(df2).except(df1.intersect(df2)) 

Seriously though this UNION, INTERSECT and EXCEPT / MINUS is pretty much a standard set of SQL combining operators. I am not aware of any system which provides XOR like operation out of the box. Most likely because it is trivial to implement using other three and there is not much to optimize there.

like image 183
zero323 Avatar answered Oct 06 '22 03:10

zero323


why not the below?

df1.except(df2) 
like image 40
Tal Barda Avatar answered Oct 06 '22 04:10

Tal Barda