Can someone please help me understand the following behavior that occurs when I add a WHERE clause to a query that has a LEFT JOIN with COUNT(*)?
I have two tables:
TABLE 1: customers
customer_id | name
------------------
1 | Bob
2 | James
3 | Fred
TABLE 2: orders
order_id | customer_id | order_timestamp
----------------------------------------
1000 | 1 | 2011-01-01 00:00
1001 | 1 | 2011-01-05 00:00
1002 | 2 | 2011-01-10 00:00
Now the following query tells me how many orders each customer placed:
select c.customer_id, count(o.order_id)
from customers c
left join orders o using (customer_id)
group by 1
customer_id | count
-------------------
1 | 2
2 | 1
3 | 0
This works great BUT if I add a WHERE clause to the query, the query no longer outputs counts of zeroes for customers who did not place any orders even though I'm doing a LEFT JOIN:
select c.customer_id, count(o.order_id)
from customers c
left join orders o using (customer_id)
where o.order_timestamp >= '2011-01-05'
group by 1
customer_id | count
-------------------
1 | 1
2 | 1
Now if I move the WHERE condition as part of the LEFT JOIN like the following, I get back my zero counts for customers who did not place orders:
select c.customer_id, count(o.order_id)
from customers c
left join orders o on (c.customer_id = o.customer_id) and (o.order_timestamp >= '2011-01-05')
group by 1
I'm confused at why the second query does not work, but the third one does? Can someone please provide me with an explanation? Also not sure if this matters, but I'm using postgres. Thanks!
Placement of filter criteria matters when dealing with OUTER joins (RIGHT, LEFT). Criteria in the ON clause of an OUTER JOIN is applied before the JOIN; criteria in the WHERE clause is applied after the JOIN -- applied against the resultset that uses the JOIN.
SELECT c.customer_id,
COUNT(o.order_id)
FROM CUSTOMERS c
LEFT JOIN ORDERS o ON o.customer_id - c.customer_id
AND o.order_timestamp >= '2011-01-05'
GROUP BY c.customer_id
Ordinals, meaning using a numeric value that refers to the numeric position of the columns in the SELECT clause, is not a recommended practice. If anyone changes the query -- say to add a column -- it could drastically affect your query.
This is because NULL is not greater than or equal to anything; If you change your WHERE clause to where o.order_timestamp is null or o.order_timestamp >= '2011-01-05'
then you will get the same behavior as your join clause limit.
Note though - I would recommend the join clause approach, as it matches more closely what you are trying to do. Also the change to the WHERE clause I mentioned above will only work if the order_timestamp column is not nullable -- if it is then you should use a different column for the null check (eg, where o.primarykey is null or o.order_timestamp >= '2011-01-05'
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With