Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does SQL join work?

I am trying to understand how does joins work internally. What will be the difference between the way in which the following two queries would run?

For example

(A)

Select * 
FROM TABLE1
FULL JOIN TABLE2 ON TABLE1.ID = TABLE2.ID
FULL JOIN TABLE3 ON TABLE1.ID = TABLE3.ID

And

(B)

Select * 
FROM TABLE1
FULL JOIN TABLE2 ON TABLE1.ID = TABLE2.ID
FULL JOIN TABLE3 ON TABLE2.ID = TABLE3.ID

Edit: I am talking about oracle here. Consider some records present in table 2 and table 3 but not in table 1, query A would give two rows for that record but B would give only one row.

like image 290
MOZILLA Avatar asked Jan 13 '09 14:01

MOZILLA


People also ask

How do SQL joins work internally?

An SQL INNER JOIN is same as JOIN clause, combining rows from two or more tables. An inner join of A and B gives the result of A intersect B, i.e. the inner part of a Venn diagram intersection. Inner joins use a comparison operator to match rows from two tables based on the values in common columns from each table.

What does (+) mean in SQL joins?

The plus sign is Oracle syntax for an outer join. There isn't a minus operator for joins. An outer join means return all rows from one table. Also return the rows from the outer joined where there's a match on the join key. If there's no matching row, return null.

How does join work in databases?

A JOIN clause is used when you need to combine data from two or more tables into one data set. Records from both tables are matched based on a condition (also called a JOIN predicate) you specify in the JOIN clause. If the condition is met, the records are included in the output.

How does the join operator work?

The Join operator joins two sequences (collections) based on a key and returns a resulted sequence. The GroupJoin operator joins two sequences based on keys and returns groups of sequences. It is like Left Outer Join of SQL.


3 Answers

Your DBMS's optimiser will determine how best to perform the query. Usually this is done by "cost based optimisation", where a number of different query plans are considered and the most efficient one selected. If your two queries are logically identical, it is most likely that the optimiser will end up using the same query plan whichever way you write it. In fact, it would be a poor optimiser these days that produced different query plans based on such minor differences in the SQL.

However, full outer joins are a different matter (in Oracle at least), since the way the columns are joined influences the result. i.e. the 2 queries are not interchangeable.

You can use AUTOTRACE in SQL Plus to see the different plans:

SQL> select *
  2  from t1
  3  full join t2 on t2.id = t1.id
  4  full join t3 on t3.id = t2.id;

        ID         ID         ID
---------- ---------- ----------
                    1          1

1 row selected.


Execution Plan
----------------------------------------------------------

---------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     3 |   117 |    29  (11)|
|   1 |  VIEW                   |      |     3 |   117 |    29  (11)|
|   2 |   UNION-ALL             |      |       |       |            |
|*  3 |    HASH JOIN OUTER      |      |     2 |   142 |    15  (14)|
|   4 |     VIEW                |      |     2 |    90 |    11  (10)|
|   5 |      UNION-ALL          |      |       |       |            |
|*  6 |       HASH JOIN OUTER   |      |     1 |    91 |     6  (17)|
|   7 |        TABLE ACCESS FULL| T1   |     1 |    52 |     2   (0)|
|   8 |        TABLE ACCESS FULL| T2   |     1 |    39 |     3   (0)|
|*  9 |       HASH JOIN ANTI    |      |     1 |    26 |     6  (17)|
|  10 |        TABLE ACCESS FULL| T2   |     1 |    13 |     3   (0)|
|  11 |        TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)|
|  12 |     TABLE ACCESS FULL   | T3   |     1 |    26 |     3   (0)|
|* 13 |    HASH JOIN ANTI       |      |     1 |    26 |    15  (14)|
|  14 |     TABLE ACCESS FULL   | T3   |     1 |    13 |     3   (0)|
|  15 |     VIEW                |      |     2 |    26 |    11  (10)|
|  16 |      UNION-ALL          |      |       |       |            |
|* 17 |       HASH JOIN OUTER   |      |     1 |    39 |     6  (17)|
|  18 |        TABLE ACCESS FULL| T1   |     1 |    26 |     2   (0)|
|  19 |        TABLE ACCESS FULL| T2   |     1 |    13 |     3   (0)|
|* 20 |       HASH JOIN ANTI    |      |     1 |    26 |     6  (17)|
|  21 |        TABLE ACCESS FULL| T2   |     1 |    13 |     3   (0)|
|  22 |        TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)|
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T3"."ID"(+)="T2"."ID")
   6 - access("T2"."ID"(+)="T1"."ID")
   9 - access("T2"."ID"="T1"."ID")
  13 - access("T3"."ID"="T2"."ID")
  17 - access("T2"."ID"(+)="T1"."ID")
  20 - access("T2"."ID"="T1"."ID")

SQL> select *
  2  from t1
  3  full join t2 on t2.id = t1.id
  4  full join t3 on t3.id = t1.id;

        ID         ID         ID
---------- ---------- ----------
                    1
                               1

2 rows selected.


Execution Plan
----------------------------------------------------------

---------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     3 |   117 |    29  (11)|
|   1 |  VIEW                   |      |     3 |   117 |    29  (11)|
|   2 |   UNION-ALL             |      |       |       |            |
|*  3 |    HASH JOIN OUTER      |      |     2 |   142 |    15  (14)|
|   4 |     VIEW                |      |     2 |    90 |    11  (10)|
|   5 |      UNION-ALL          |      |       |       |            |
|*  6 |       HASH JOIN OUTER   |      |     1 |    91 |     6  (17)|
|   7 |        TABLE ACCESS FULL| T1   |     1 |    52 |     2   (0)|
|   8 |        TABLE ACCESS FULL| T2   |     1 |    39 |     3   (0)|
|*  9 |       HASH JOIN ANTI    |      |     1 |    26 |     6  (17)|
|  10 |        TABLE ACCESS FULL| T2   |     1 |    13 |     3   (0)|
|  11 |        TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)|
|  12 |     TABLE ACCESS FULL   | T3   |     1 |    26 |     3   (0)|
|* 13 |    HASH JOIN ANTI       |      |     1 |    26 |    15  (14)|
|  14 |     TABLE ACCESS FULL   | T3   |     1 |    13 |     3   (0)|
|  15 |     VIEW                |      |     2 |    26 |    11  (10)|
|  16 |      UNION-ALL          |      |       |       |            |
|* 17 |       HASH JOIN OUTER   |      |     1 |    39 |     6  (17)|
|  18 |        TABLE ACCESS FULL| T1   |     1 |    26 |     2   (0)|
|  19 |        TABLE ACCESS FULL| T2   |     1 |    13 |     3   (0)|
|* 20 |       HASH JOIN ANTI    |      |     1 |    26 |     6  (17)|
|  21 |        TABLE ACCESS FULL| T2   |     1 |    13 |     3   (0)|
|  22 |        TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)|
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T3"."ID"(+)="T1"."ID")
   6 - access("T2"."ID"(+)="T1"."ID")
   9 - access("T2"."ID"="T1"."ID")
  13 - access("T3"."ID"="T1"."ID")
  17 - access("T2"."ID"(+)="T1"."ID")
  20 - access("T2"."ID"="T1"."ID")

In fact, the query plans are identical except for the Predicate information

like image 200
Tony Andrews Avatar answered Oct 12 '22 07:10

Tony Andrews


Use EXPLAIN PLAN:

http://download.oracle.com/docs/cd/B10500_01/server.920/a96533/ex_plan.htm

like image 28
mbeckish Avatar answered Oct 12 '22 07:10

mbeckish


You stated interest in "internals", and then asked an example that illustrates "semantics". I'm answering semantics.

Consider these tables.

Table1 : 1, 4, 6
Table2 : 2, 4, 5
Table3 : 3, 5, 6

Both examples perform the same join first, so I'll perform that here.

FirstResult = T1 FULL JOIN T2 : (T1, T2)
(1, null)
(4, 4)
(6, null)
(null, 2)
(null, 5)

Example (A)

FirstResult FULL JOIN T3 ON FirstItem : (T1, T2, T3)

(1, null, null)
(4, 4, null)
(6, null, 6)   <----
(null, 2, null)
(null, 5, null)   <----
(null, null, 3)

Example (B)

FirstResult FULL JOIN T3 ON SecondItem : (T1, T2, T3)
(1, null, null)
(4, 4, null)
(6, null, null)   <----
(null, 2, null)
(null, 5, 5)   <----
(null, null, 3)

This shows you logically how to produce the results from the joins.

For "internals", there's something called a query optimizer, which will produce these same results - but it will make implementation choices to do the computation/io fast. These choices include:

  • which tables to access first
  • look into a table using an index or table scan
  • which join implementation type to use (nested loop, merge, hash).

Also note: due to the optimizer making these choices, and changing these choices based on what it considers to be optimal - the order of the results can change. The default ordering of results is always "what is easiest". If you don't want the default ordering, you need to specify ordering in your query.

To see exactly what the optimizer will do with a query (at that moment, because it can change its mind), you need to view the execution plan.

like image 21
Amy B Avatar answered Oct 12 '22 08:10

Amy B