Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using IS NULL or IS NOT NULL on join conditions - Theory question

Theory question here:

Why does specifying table.field IS NULL or table.field IS NOT NULL not work on a join condition (left or right join for instance) but only in the where condition?

Non working Example:

-this should return all shipments with any returns (non null values) filtered out. However, this returns all shipments regardless if anything meets the [r.id is null] statement.

SELECT
  *
FROM 
  shipments s
LEFT OUTER JOIN returns r  
  ON s.id = r.id
  AND r.id is null
WHERE
  s.day >= CURDATE() - INTERVAL 10 DAY 

Working example:

-This returns the correct amount of rows which is total shipments, less any related to a returns (non null values).

SELECT
  *
FROM 
  shipments s
LEFT OUTER JOIN returns r  
  ON s.id = r.id
WHERE
  s.day >= CURDATE() - INTERVAL 10 DAY
  AND r.id is null

Why is this the case? All other filter conditions between two tables being joined work just fine, but for some reason IS NULL and IS NOT NULL filters do not work unless in the where statement.

What is the reason for this?

like image 393
JoshG Avatar asked Jul 08 '11 21:07

JoshG


6 Answers

Example with tables A and B:

 A (parent)       B (child)    
============    =============
 id | name        pid | name 
------------    -------------
  1 | Alex         1  | Kate
  2 | Bill         1  | Lia
  3 | Cath         3  | Mary
  4 | Dale       NULL | Pan
  5 | Evan  

If you want to find parents and their kids, you do an INNER JOIN:

SELECT id,  parent.name AS parent
     , pid, child.name  AS child

FROM
        parent  INNER JOIN  child
  ON   parent.id     =    child.pid

Result is that every match of a parent's id from the left table and a child's pid from the second table will show as a row in the result:

+----+--------+------+-------+
| id | parent | pid  | child | 
+----+--------+------+-------+
|  1 | Alex   |   1  | Kate  |
|  1 | Alex   |   1  | Lia   |
|  3 | Cath   |   3  | Mary  |
+----+--------+------+-------+

Now, the above does not show parents without kids (because their ids do not have a match in child's ids, so what do you do? You do an outer join instead. There are three types of outer joins, the left, the right and the full outer join. We need the left one as we want the "extra" rows from the left table (parent):

SELECT id,  parent.name AS parent
     , pid, child.name  AS child

FROM
        parent  LEFT JOIN  child
  ON   parent.id    =    child.pid

Result is that besides previous matches, all parents that do not have a match (read: do not have a kid) are shown too:

+----+--------+------+-------+
| id | parent | pid  | child | 
+----+--------+------+-------+
|  1 | Alex   |   1  | Kate  |
|  1 | Alex   |   1  | Lia   |
|  3 | Cath   |   3  | Mary  |
|  2 | Bill   | NULL | NULL  |
|  4 | Dale   | NULL | NULL  |
|  5 | Evan   | NULL | NULL  |
+----+--------+------+-------+

Where did all those NULL come from? Well, MySQL (or any other RDBMS you may use) will not know what to put there as these parents have no match (kid), so there is no pid nor child.name to match with those parents. So, it puts this special non-value called NULL.

My point is that these NULLs are created (in the result set) during the LEFT OUTER JOIN.


So, if we want to show only the parents that do NOT have a kid, we can add a WHERE child.pid IS NULL to the LEFT JOIN above. The WHERE clause is evaluated (checked) after the JOIN is done. So, it's clear from the above result that only the last three rows where the pid is NULL will be shown:

SELECT id,  parent.name AS parent
     , pid, child.name  AS child

FROM
        parent  LEFT JOIN  child
  ON   parent.id    =    child.pid

WHERE child.pid IS NULL

Result:

+----+--------+------+-------+
| id | parent | pid  | child | 
+----+--------+------+-------+
|  2 | Bill   | NULL | NULL  |
|  4 | Dale   | NULL | NULL  |
|  5 | Evan   | NULL | NULL  |
+----+--------+------+-------+

Now, what happens if we move that IS NULL check from the WHERE to the joining ON clause?

SELECT id,  parent.name AS parent
     , pid, child.name  AS child

FROM
        parent  LEFT JOIN  child
  ON   parent.id    =    child.pid
  AND  child.pid IS NULL

In this case the database tries to find rows from the two tables that match these conditions. That is, rows where parent.id = child.pid AND child.pid IN NULL. But it can find no such match because no child.pid can be equal to something (1, 2, 3, 4 or 5) and be NULL at the same time!

So, the condition:

ON   parent.id    =    child.pid
AND  child.pid IS NULL

is equivalent to:

ON   1 = 0

which is always False.

So, why does it return ALL rows from the left table? Because it's a LEFT JOIN! And left joins return rows that match (none in this case) and also rows from the left table that do not match the check (all in this case):

+----+--------+------+-------+
| id | parent | pid  | child | 
+----+--------+------+-------+
|  1 | Alex   | NULL | NULL  |
|  2 | Bill   | NULL | NULL  |
|  3 | Cath   | NULL | NULL  |
|  4 | Dale   | NULL | NULL  |
|  5 | Evan   | NULL | NULL  |
+----+--------+------+-------+

I hope the above explanation is clear.



Sidenote (not directly related to your question): Why on earth doesn't Pan show up in none of our JOINs? Because his pid is NULL and NULL in the (not common) logic of SQL is not equal to anything so it can't match with any of the parent ids (which are 1,2,3,4 and 5). Even if there was a NULL there, it still wouldn't match because NULL does not equal anything, not even NULL itself (it's a very strange logic, indeed!). That's why we use the special check IS NULL and not a = NULL check.

So, will Pan show up if we do a RIGHT JOIN ? Yes, it will! Because a RIGHT JOIN will show all results that match (the first INNER JOIN we did) plus all rows from the RIGHT table that don't match (which in our case is one, the (NULL, 'Pan') row.

SELECT id,  parent.name AS parent
     , pid, child.name  AS child

FROM
        parent  RIGHT JOIN  child
  ON   parent.id     =    child.pid

Result:

+------+--------+------+-------+
| id   | parent | pid  | child | 
+---------------+------+-------+
|   1  | Alex   |   1  | Kate  |
|   1  | Alex   |   1  | Lia   |
|   3  | Cath   |   3  | Mary  |
| NULL | NULL   | NULL | Pan   |
+------+--------+------+-------+

Unfortunately, MySQL does not have FULL JOIN. You can try it in other RDBMSs, and it will show:

+------+--------+------+-------+
|  id  | parent | pid  | child | 
+------+--------+------+-------+
|   1  | Alex   |   1  | Kate  |
|   1  | Alex   |   1  | Lia   |
|   3  | Cath   |   3  | Mary  |
|   2  | Bill   | NULL | NULL  |
|   4  | Dale   | NULL | NULL  |
|   5  | Evan   | NULL | NULL  |
| NULL | NULL   | NULL | Pan   |
+------+--------+------+-------+
like image 63
ypercubeᵀᴹ Avatar answered Oct 01 '22 19:10

ypercubeᵀᴹ


The NULL part is calculated AFTER the actual join, so that is why it needs to be in the where clause.

like image 35
Sabeen Malik Avatar answered Oct 01 '22 20:10

Sabeen Malik


Actually NULL filter is not being ignored. Thing is this is how joining two tables work.

I will try to walk down with the steps performed by database server to make it understand. For example when you execute the query which you said is ignoring the NULL condition. SELECT * FROM shipments s LEFT OUTER JOIN returns r
ON s.id = r.id AND r.id is null WHERE s.day >= CURDATE() - INTERVAL 10 DAY

1st thing happened is all the rows from table SHIPMENTS get selected

on next step database server will start selecting one by one record from 2nd(RETURNS) table.

on third step the record from RETURNS table will be qualified against the join conditions you have provided in the query which in this case is (s.id = r.id and r.id is NULL)

note that this qualification applied on third step only decides if server should accept or reject the current record of RETURNS table to append with the selected row of SHIPMENT table. It can in no way effect the selection of record from SHIPMENT table.

And once server is done with joining two tables which contains all the rows of SHIPMENT table and selected rows of RETURNS table it applies the where clause on the intermediate result. so when you put (r.id is NULL) condition in where clause than all the records from the intermediate result with r.id = null gets filtered out.

like image 42
Muhammad Usama Avatar answered Oct 01 '22 18:10

Muhammad Usama


You're doing a LEFT OUTTER JOIN which indicates that you want every tuple from the table on the LEFT of the statement regardless of it has a matching record in the RIGHT table. This being the case, your results are being pruned from the RIGHT table but you're ending up with the same results as if you didn't include the AND at all within the ON clause.

Performing the AND in the WHERE clause causes the prune to happen after the LEFT JOIN takes place.

like image 22
Suroot Avatar answered Oct 01 '22 19:10

Suroot


The WHERE clause is evaluated after the JOIN conditions have been processed.

like image 32
Joe Stefanelli Avatar answered Oct 01 '22 18:10

Joe Stefanelli


Your execution plan should make this clear; the JOIN takes precedence, after which the results are filtered.

like image 36
Paul Sonier Avatar answered Oct 01 '22 18:10

Paul Sonier