In the SparkSQL
1.6 API (scala) Dataframe
has functions for intersect and except, but not one for difference. Obviously, a combination of union and except can be used to generate difference:
df1.except(df2).union(df2.except(df1))
But this seems a bit awkward. In my experience, if something seems awkward, there's a better way to do it, especially in Scala.
The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
subtract() function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe – other but with a support to substitute for missing data in one of the inputs.
To find the common rows between two DataFrames with merge(), use the parameter “how” as “inner” since it works like SQL Inner Join and this is what we want to achieve.
Use df. dates1-df. dates2 to find the difference between the two dates and then convert the result in the form of months.
You can always rewrite it as:
df1.unionAll(df2).except(df1.intersect(df2))
Seriously though this UNION
, INTERSECT
and EXCEPT
/ MINUS
is pretty much a standard set of SQL combining operators. I am not aware of any system which provides XOR like operation out of the box. Most likely because it is trivial to implement using other three and there is not much to optimize there.
why not the below?
df1.except(df2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With