Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Explanation needed for missing rows with left join and count()

Can someone please help me understand the following behavior that occurs when I add a WHERE clause to a query that has a LEFT JOIN with COUNT(*)?

I have two tables:

TABLE 1: customers
customer_id | name
------------------
1 | Bob
2 | James
3 | Fred

TABLE 2: orders
order_id | customer_id | order_timestamp
----------------------------------------
1000 | 1 | 2011-01-01 00:00
1001 | 1 | 2011-01-05 00:00
1002 | 2 | 2011-01-10 00:00

Now the following query tells me how many orders each customer placed:

select c.customer_id, count(o.order_id)
from customers c
left join orders o using (customer_id)
group by 1

customer_id | count
-------------------
1 | 2
2 | 1
3 | 0

This works great BUT if I add a WHERE clause to the query, the query no longer outputs counts of zeroes for customers who did not place any orders even though I'm doing a LEFT JOIN:

select c.customer_id, count(o.order_id)
from customers c
left join orders o using (customer_id)
where o.order_timestamp >= '2011-01-05'
group by 1

customer_id | count
-------------------
1 | 1
2 | 1

Now if I move the WHERE condition as part of the LEFT JOIN like the following, I get back my zero counts for customers who did not place orders:

select c.customer_id, count(o.order_id)
from customers c
left join orders o on (c.customer_id = o.customer_id) and (o.order_timestamp >= '2011-01-05')
group by 1

I'm confused at why the second query does not work, but the third one does? Can someone please provide me with an explanation? Also not sure if this matters, but I'm using postgres. Thanks!

like image 596
DeadMonkey Avatar asked Jun 04 '11 04:06

DeadMonkey


2 Answers

Placement of filter criteria matters when dealing with OUTER joins (RIGHT, LEFT). Criteria in the ON clause of an OUTER JOIN is applied before the JOIN; criteria in the WHERE clause is applied after the JOIN -- applied against the resultset that uses the JOIN.

   SELECT c.customer_id, 
          COUNT(o.order_id)
     FROM CUSTOMERS c
LEFT JOIN ORDERS o ON o.customer_id - c.customer_id
                  AND o.order_timestamp >= '2011-01-05'
 GROUP BY c.customer_id

Ordinals

Ordinals, meaning using a numeric value that refers to the numeric position of the columns in the SELECT clause, is not a recommended practice. If anyone changes the query -- say to add a column -- it could drastically affect your query.

like image 102
OMG Ponies Avatar answered Oct 11 '22 19:10

OMG Ponies


This is because NULL is not greater than or equal to anything; If you change your WHERE clause to where o.order_timestamp is null or o.order_timestamp >= '2011-01-05' then you will get the same behavior as your join clause limit.

Note though - I would recommend the join clause approach, as it matches more closely what you are trying to do. Also the change to the WHERE clause I mentioned above will only work if the order_timestamp column is not nullable -- if it is then you should use a different column for the null check (eg, where o.primarykey is null or o.order_timestamp >= '2011-01-05').

like image 32
Chris Shaffer Avatar answered Oct 11 '22 19:10

Chris Shaffer