In the <code>SparkSQL</code> 1.6 API (scala) <code>Dataframe</code> has functions for intersect and except, but not one for difference. Obviously, a combination of union and except can be used to generate difference: <pre class="prettyprint"><code>df1.except(df2).union(df2.except(df1)) </code></pre> But this seems a bit awkward. In my experience, if something seems awkward, there's a better way to do it, especially in Scala.

You can always rewrite it as: <pre class="prettyprint"><code>df1.unionAll(df2).except(df1.intersect(df2)) </code></pre> Seriously though this <code>UNION</code>, <code>INTERSECT</code> and <code>EXCEPT</code> / <code>MINUS</code> is pretty much a standard set of SQL combining operators. I am not aware of any system which provides XOR like operation out of the box. Most likely because it is trivial to implement using other three and there is not much to optimize there.

why not the below? <pre class="prettyprint"><code>df1.except(df2) </code></pre>

How to obtain the symmetric difference between two DataFrames?

Tags:

scala

apache-spark

apache-spark-sql

In the SparkSQL 1.6 API (scala) Dataframe has functions for intersect and except, but not one for difference. Obviously, a combination of union and except can be used to generate difference:

df1.except(df2).union(df2.except(df1))

But this seems a bit awkward. In my experience, if something seems awkward, there's a better way to do it, especially in Scala.

802

asked Mar 24 '16 12:03

WillD

2 Answers

You can always rewrite it as:

df1.unionAll(df2).except(df1.intersect(df2))

Seriously though this UNION, INTERSECT and EXCEPT / MINUS is pretty much a standard set of SQL combining operators. I am not aware of any system which provides XOR like operation out of the box. Most likely because it is trivial to implement using other three and there is not much to optimize there.

183

answered Oct 06 '22 03:10

zero323

why not the below?

df1.except(df2)

answered Oct 06 '22 04:10

Tal Barda

Related questions
                            
                                How Jack (Java Android Compiler Kit) will affect Scala developers
                            
                                Recommended Scala io library
                            
                                Idiomatic way to write multi-project builds with .sbt files in sbt 0.13
                            
                                Scala: implementing method with return type of concrete instance
                            
                                Play 2.0+Java vs. Play 2.0+Scala?
                            
                                What happened to Scala.React?
                            
                                Is it possible in Intellij IDEA Scala plugin to know which implicit conversion was applied?
                            
                                Scala: Passing one implicit parameter implicitly and the other explicitly. Is it possible?
                            
                                What's the best way to create a dynamically growing array in Scala?
                            
                                Sign CSR using Bouncy Castle
                            
                                What is and when to use Scala's forSome keyword?
                            
                                Functional languages targeting the LLVM
                            
                                Explicit self-references with no type / difference with ''this''
                            
                                How can I find a description of scala compiler flags/options?
                            
                                Getting "cat: /release: No such file or directory" when running scala
                            
                                Select Specific Columns from Spark DataFrame
                            
                                Scala convert List[Int] to a java.util.List[java.lang.Integer]
                            
                                SBT to Maven Converter
                            
                                Scala: Can there be any reason to prefer `filter+map` over `collect`?
                            
                                Spark2.1.0 incompatible Jackson versions 2.7.6

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With