I am wondering why when you join two tables on a key within an exists subquery the join has to happen within the WHERE clause instead of the FROM clause.
This is my example:
Join within FROM clause:
SELECT payer_id
FROM Population1
WHERE NOT EXISTS
(Select *
From Population2 join Population1
On Population2.payer_id = Population1.payer_id)
Join within WHERE clause:
SELECT payer_id
FROM Population1
WHERE NOT EXISTS
(Select *
From Population2
WHERE Population2.payer_id = Population1.payer_id)
The first query gives me 0 results, which I know is incorrect, while the second query gives the the thousands of results I am expecting to see.
Could someone just explain to me why where the join happens in an EXISTS subquery matters? If you take the subqueries without the parent query and run them they literally give you the same result.
It would help me a lot to remember to not continue to make this mistake when using exists.
Thanks in advance.
You need to understand the distinction between a regular subquery and a correlated subquery.
Using your examples, this should be easy. The first where
clause is:
where not exists (Select 1
from Population2 join
Population1
on Population2.payer_id = Population1.payer_id
)
This condition does exactly what it says it is doing. The subquery has no connection to the outer query. So, the not exists
will either filter out all rows or keep all rows.
In this case, the engine runs the subquery and determines that at least one row is returned. Hence, not exists
returns false in all cases, and the nothing is returned.
In the second case, the subquery is a correlated subquery. So, for each row in population1
the subquery is run using the value of Population1.payer_id
. In some cases, matching rows exist in Population2
; these are filtered out. In other cases, matching rows do not exist; these are in the result set.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With