Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Outer Join Filtering Conditions in ON versus WHERE

Tags:

sql

left-join

Why are the following queries different? I want a LEFT OUTER join, but need to filter the children with a condition. I thought these queries were essentially the same (just different syntax), but I get different results if I put the condition in ON versus WHERE:

-- Query 1: Filter in WHERE
SELECT  p.ID, p.Name, c.ID, c.Name, c.ParentID
FROM    @Parent p
  LEFT OUTER JOIN @Child c
    ON (p.ID = c.ParentID)
WHERE   c.ID IS NULL OR c.Name = 'T';

-- Query 2: Filter in ON
SELECT  p.ID, p.Name, c.ID, c.Name, c.ParentID
FROM    @Parent p
  LEFT OUTER JOIN @Child c
    ON (p.ID = c.ParentID AND c.Name = 'T');

I started with Query 2 but it showed all of the parents in the results, not the subset with matching children, so I switched to Query 1. Here is an example:

DECLARE @Parent TABLE (
     ID           int           IDENTITY(1, 1) PRIMARY KEY
  ,  Name         nvarchar(40)  NOT NULL
);

DECLARE @Child TABLE (
     ID           int           IDENTITY(1, 1) PRIMARY KEY
  ,  Name         nvarchar(40)  NOT NULL
  ,  ParentID     int               NULL
);

-- Parents
INSERT  @Parent (Name)
VALUES  ('A'), ('B'), ('C'), ('D')
;

-- Children: permutations to parents.
-- NOTE: 'D' has no children
INSERT  @Child (Name, ParentID)
VALUES  ('T', 1)
    ,             ('U', 2)
    ,   ('V', 1), ('V', 2)
    ,                       ('W', 3)
    ,   ('X', 1),           ('X', 3)
    ,             ('Y', 2), ('Y', 3)
    ,   ('Z', 1), ('Z', 2), ('Z', 3)
;

-- Query 1: Filter in WHERE
SELECT  p.ID, p.Name, c.ID, c.Name, c.ParentID
FROM    @Parent p
  LEFT OUTER JOIN @Child c
    ON (p.ID = c.ParentID)
WHERE   c.ID IS NULL OR c.Name = 'T';

-- Query 2: Filter in ON
SELECT  p.ID, p.Name, c.ID, c.Name, c.ParentID
FROM    @Parent p
  LEFT OUTER JOIN @Child c
    ON (p.ID = c.ParentID AND c.Name = 'T');

Query 1: Results

ID Name ID Name ParentID
1 A 1 T 1
4 D NULL NULL NULL

Query 2: Results

ID Name ID Name ParentID
1 A 1 T 1
2 B NULL NULL NULL
3 C NULL NULL NULL
4 D NULL NULL NULL

I assumed the queries would return the same results and I was surprised when they didn't. I prefer the style of query 2 (and I think it is more optimal), but I thought the queries would return the same results.

(NOTE: The SQL example with data was added much later for clarification as to why this question is not a duplicate of another question, and to bring it up to current question standards. The sample results make it much clearer that Query 1 returns the parents with 1 or more matching children and parents with no children. Query 2 returns all parents but only matching children. Obviously I understand the difference between the queries now.)

Edit/Summary:

There were some great answers provided here. I had a hard time choosing to whom to award the answer. I decided to go with mdma since it was the first answer and one of the clearest. Based on the supplied answers, here is my summary:

Possible results:

  • A: Parent with no children
  • B: Parents with children
  • |-> B1: Parents with children where no child matches the filter
  • \-> B2: Parents with children where 1 or more match the filter

Query results:

  • Query 1 returns (A, B2)
  • Query 2 returns (A, B1, B2)

Query 2 always returns a parent because of the left join. In query 1, the WHERE clause is performed after the left join, so parents with children where none of the children match the filter are excluded (case B1).

Note: only parent information is returned in case B1, and in case B2 only the parent/child information matching the filter is returned.

HLGEM provided a good link (now dead, so using archive.org):

https://web.archive.org/web/20180814131549/http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN

like image 405
Ryan Avatar asked May 21 '10 14:05

Ryan


2 Answers

Yes, there is a huge difference. When you place filters in the ON clause on a LEFT JOIN, the filter is applied before the results are joined to the outer table. When you apply a filter in the WHERE clause, it happens after the LEFT JOIN has been applied.

In short, the first query will exclude rows where there are child rows but the child description is not equal to the filter condition, whereas the second query will always return a row for the parent.

like image 88
Thomas Avatar answered Nov 09 '22 18:11

Thomas


The first query will return cases where the parent has no children or where some of the children match the filter condition. Specificaly, cases where the parent has one child, but it doesn't match the filter condition will be omitted.

The second query will return a row for all parents. If there is no match on filter condition, a NULL will be returned for all of c's columns. This is why you are getting more rows in query 2 - parents with children that don't match the filter condition are output with NULL child values, where in the first query they are filtered out.

like image 21
mdma Avatar answered Nov 09 '22 16:11

mdma