Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a difference between OUTER & FULL_OUTER in Spark SQL?

Spark SQL documentation specifies that join() supports the following join types:

Must be one of: inner, cross, outer, full, full_outer, left, left_outer, right, right_outer, left_semi, and left_anti.

Spark SQL Join()

Is there any difference between outer and full_outer? I suspect not, I suspect they are just synonyms for each other, but wanted to get clarity.

like image 342
jamiet Avatar asked Oct 02 '17 09:10

jamiet


2 Answers

Spark v2.4.0 join code (the _ has been suppressed):

case "inner" => Inner
case "outer" | "full" | "fullouter" => FullOuter
case "leftouter" | "left" => LeftOuter
case "rightouter" | "right" => RightOuter
case "leftsemi" => LeftSemi
case "leftanti" => LeftAnti
case "cross" => Cross

So Spark really supports: Inner, FullOuter, LeftOuter, RightOuter, LeftSemi, LeftAnti, and Cross.

Quick example, given:

+---+-----+
| id|value|
+---+-----+
|  1|   A1|
|  2|   A2|
|  3|   A3|
|  4|   A4|
+---+-----+

and:

+---+-----+
| id|value|
+---+-----+
|  3|   A3|
|  4|   A4|
|  4| A4_1|
|  5|   A5|
|  6|   A6|
+---+-----+

You get:

OUTER JOIN

+----+-----+----+-----+
|  id|value|  id|value|
+----+-----+----+-----+
|null| null|   5|   A5|
|null| null|   6|   A6|
|   1|   A1|null| null|
|   2|   A2|null| null|
|   3|   A3|   3|   A3|
|   4|   A4|   4|   A4|
|   4|   A4|   4| A4_1|
+----+-----+----+-----+

FULL_OUTER JOIN

+----+-----+----+-----+
|  id|value|  id|value|
+----+-----+----+-----+
|null| null|   5|   A5|
|null| null|   6|   A6|
|   1|   A1|null| null|
|   2|   A2|null| null|
|   3|   A3|   3|   A3|
|   4|   A4|   4|   A4|
|   4|   A4|   4| A4_1|
+----+-----+----+-----+
like image 100
jgp Avatar answered Nov 07 '22 06:11

jgp


There is no difference between outer and full_outer - they are the same. See the following answer for a demonstration: What are the various join types in Spark?

like image 33
dric Avatar answered Nov 07 '22 08:11

dric