Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are left outer joins associative?

It's easy to understand why left outer joins are not commutative, but I'm having some trouble understanding whether they are associative. Several online sources suggest that they are not, but I haven't managed to convince myself that this is the case.

Suppose we have three tables: A, B, and C.

Let A contain two columns, ID and B_ID, where ID is the primary key of table A and B_ID is a foreign key corresponding to the primary key of table B.

Let B contain two columns, ID and C_ID, where ID is the primary key of table B and C_ID is a foreign key corresponding to the primary key of table C.

Let C contain two columns, ID and VALUE, where ID is the primary key of table C and VALUE just contains some arbitrary values.

Then shouldn't (A left outer join B) left outer join C be equal to A left outer join (B left outer join C)?

like image 323
Tianxiang Xiong Avatar asked Nov 16 '13 18:11

Tianxiang Xiong


People also ask

Is left outer join commutative?

Outer joins are not commutative. Therefore, a LEFT JOIN b is not the same as b LEFT JOIN a. Outer joins are not associative either.

Are joins associative?

Inner and full outer joins are both commutative and associative, i.e. the following is fair for them: A [FULL | INNER] JOIN B = B [FULL | INNER] JOIN A.

Is right outer join commutative?

In a chain of full outer joins, the first join is commutative but the remaining joins are not necessarily commutative. Thus: A full outer join B full outer join C = B full outer join A full outer join C, but. A full outer join B full outer join C does not equal C full outer join A full outer join B.

What happens in left outer join?

A left outer join is a method of combining tables. The result includes unmatched rows from only the table that is specified before the LEFT OUTER JOIN clause. If you are joining two tables and want the result set to include unmatched rows from only one table, use a LEFT OUTER JOIN clause or a RIGHT OUTER JOIN clause.


2 Answers

In this thread, it is said, that they are not associative: Is LEFT OUTER JOIN associative?

However, I've found some book online where it is stated, that OUTER JOINs are associative, when the tables on the far left side and far right side have no attributes in common (here).

Here is a graphical presentation (MSPaint ftw):

Graphical presentation of impact of left outer joins order

Another way to look at it:

Since you said that table A joins with B, and B joins with C, then:

  • When you first join A and B, you are left with all records from A. Some of them have values from B. Now, for some of those rows for which you got value from B, you get values from C.
  • When you first join B and C, you and up with the whole table B, where some of the records have values from C. Now, you take all records from A and join some of them with all rows from B joined with C. Here, again, you get all rows from A, but some of them have values from B, some of which have values from C.

I don't see any possibility where, in conditons described by you, there would be a data loss depending on the sequence of LEFT joins.

Basing on the data provided by Tilak in his answer (which is now deleted), I've built a simple test case:

CREATE TABLE atab (id NUMBER, val VARCHAR2(10));
CREATE TABLE btab (id NUMBER, val VARCHAR2(10));
CREATE TABLE ctab (id NUMBER, val VARCHAR2(10));

INSERT INTO atab VALUES (1, 'A1');
INSERT INTO atab VALUES (2, 'A2');
INSERT INTO atab VALUES (3, 'A3');

INSERT INTO btab VALUES (1, 'B1');
INSERT INTO btab VALUES (2, 'B2');
INSERT INTO btab VALUES (4, 'B4');

INSERT INTO ctab VALUES (1, 'C1');
INSERT INTO ctab VALUES (3, 'C3');
INSERT INTO ctab VALUES (5, 'C5');

SELECT ab.aid, ab.aval, ab.bval, c.val AS cval
  FROM (
    SELECT a.id AS aid, a.val AS aval, b.id AS bid, b.val AS bval
      FROM atab a LEFT OUTER JOIN btab b ON (a.id = b.id)
    ) ab
    LEFT OUTER JOIN ctab c ON (ab.bid = c.id)
ORDER BY ab.aid
;
       AID AVAL       BVAL       CVAL     
---------- ---------- ---------- ----------
         1 A1         B1         C1         
         2 A2         B2                    
         3 A3                               
SELECT a.id, a.val AS aval, bc.bval, bc.cval
  FROM
    atab a
    LEFT OUTER JOIN (
      SELECT b.id AS bid, b.val AS bval, c.id AS cid, c.val AS cval
        FROM btab b LEFT OUTER JOIN ctab c ON (b.id = c.id)
    ) bc
      ON (a.id = bc.bid)
ORDER BY a.id
;
        ID AVAL       BVAL       CVAL     
---------- ---------- ---------- ----------
         1 A1         B1         C1         
         2 A2         B2                    
         3 A3                            

It seems in this particular example, that both solutions give the same result. I can't think of any other dataset that would make those queries return different results.

Check at SQLFiddle:

  • MySQL
  • Oracle
  • PostgreSQL
  • SQLServer
like image 91
Przemyslaw Kruglej Avatar answered Oct 11 '22 12:10

Przemyslaw Kruglej


If you're assuming that you're JOINing on a foreign key, as your question seems to imply, then yes, I think OUTER JOIN is guaranteed to be associative, as covered by Przemyslaw Kruglej's answer.

However, given that you haven't actually specified the JOIN condition, the pedantically correct answer is that no, they're not guaranteed to be associative. There are two easy ways to violate associativity with perverse ON clauses.

1. One of the JOIN conditions involves columns from all 3 tables

This is a pretty cheap way to violate associativity, but strictly speaking nothing in your question forbade it. Using the column names suggested in your question, consider the following two queries:

-- This is legal
SELECT * FROM (A JOIN B ON A.b_id = B.id) 
              JOIN C ON (A.id = B.id) AND (B.id = C.id)


-- This is not legal
SELECT * FROM A
              JOIN (B JOIN C ON (A.id = B.id) AND (B.id = C.id))
              ON A.b_id = B.id

The bottom query isn't even a valid query, but the top one is. Clearly this violates associativity.

2. One of the JOIN conditions can be satisfied despite all fields from one table being NULL

This way, we can even have different numbers of rows in our result set depending upon the order of the JOINs. For example, let the condition for JOINing A on B be A.b_id = B.id, but the condition for JOINing B on C be B.id IS NULL.

Thus we get these two queries, with very different output:

SELECT * FROM (A LEFT OUTER JOIN B ON A.b_id = B.id) 
              LEFT OUTER JOIN C ON B.id IS NULL;


SELECT * FROM A 
              LEFT OUTER JOIN (B LEFT OUTER JOIN C ON B.id IS NULL)
              ON A.b_id = B.id;

You can see this in action here: http://sqlfiddle.com/#!9/d59139/1

like image 18
Mark Amery Avatar answered Oct 11 '22 10:10

Mark Amery