Spark SQL documentation specifies that join()
supports the following join types:
Must be one of: inner, cross, outer, full, full_outer, left, left_outer, right, right_outer, left_semi, and left_anti.
Spark SQL Join()
Is there any difference between outer
and full_outer
? I suspect not, I suspect they are just synonyms for each other, but wanted to get clarity.
Spark v2.4.0 join code (the _ has been suppressed):
case "inner" => Inner
case "outer" | "full" | "fullouter" => FullOuter
case "leftouter" | "left" => LeftOuter
case "rightouter" | "right" => RightOuter
case "leftsemi" => LeftSemi
case "leftanti" => LeftAnti
case "cross" => Cross
So Spark really supports: Inner, FullOuter, LeftOuter, RightOuter, LeftSemi, LeftAnti, and Cross.
Quick example, given:
+---+-----+
| id|value|
+---+-----+
| 1| A1|
| 2| A2|
| 3| A3|
| 4| A4|
+---+-----+
and:
+---+-----+
| id|value|
+---+-----+
| 3| A3|
| 4| A4|
| 4| A4_1|
| 5| A5|
| 6| A6|
+---+-----+
You get:
OUTER JOIN
+----+-----+----+-----+
| id|value| id|value|
+----+-----+----+-----+
|null| null| 5| A5|
|null| null| 6| A6|
| 1| A1|null| null|
| 2| A2|null| null|
| 3| A3| 3| A3|
| 4| A4| 4| A4|
| 4| A4| 4| A4_1|
+----+-----+----+-----+
FULL_OUTER JOIN
+----+-----+----+-----+
| id|value| id|value|
+----+-----+----+-----+
|null| null| 5| A5|
|null| null| 6| A6|
| 1| A1|null| null|
| 2| A2|null| null|
| 3| A3| 3| A3|
| 4| A4| 4| A4|
| 4| A4| 4| A4_1|
+----+-----+----+-----+
There is no difference between outer
and full_outer
- they are the same. See the following answer for a demonstration: What are the various join types in Spark?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With