What is the most efficient way to write a select statement similar to the below.
SELECT *
FROM Orders
WHERE Orders.Order_ID not in (Select Order_ID FROM HeldOrders)
The gist is you want the records from one table when the item is not in another table.
Use an INNER JOIN to join with your max ID's. Assuming the ID column is indexed, this is likely as fast as its going to get. MySQL will create many temporary tables, while in my example there will be only one.
When using NOT IN , the subquery returns a list of zero or more values in the outer query where the comparison column does not match any of the values returned from the subquery. Using the previous example, NOT IN returns all the products that are not supplied from MA.
An alternative for IN and EXISTS is an INNER JOIN, while a LEFT OUTER JOIN with a WHERE clause checking for NULL values can be used as an alternative for NOT IN and NOT EXISTS.
If you can write your query either way, IN is preferred as far as I'm concerned. Show activity on this post. Same for the other one, with 8 times = instead. So yes, the first one will be faster, less comparisons to be done.
For starters, a link to an old article in my blog on how NOT IN
predicate works in SQL Server
(and in other systems too):
SQL Server
You can rewrite it as follows:
SELECT *
FROM Orders o
WHERE NOT EXISTS
(
SELECT NULL
FROM HeldOrders ho
WHERE ho.OrderID = o.OrderID
)
, however, most databases will treat these queries the same.
Both these queries will use some kind of an ANTI JOIN
.
This is useful for SQL Server
if you want to check two or more columns, since SQL Server
does not support this syntax:
SELECT *
FROM Orders o
WHERE (col1, col2) NOT IN
(
SELECT col1, col2
FROM HeldOrders ho
)
Note, however, that NOT IN
may be tricky due to the way it treats NULL
values.
If Held.Orders
is nullable, no records are found and the subquery returns but a single NULL
, the whole query will return nothing (both IN
and NOT IN
will evaluate to NULL
in this case).
Consider these data:
Orders:
OrderID
---
1
HeldOrders:
OrderID
---
2
NULL
This query:
SELECT *
FROM Orders o
WHERE OrderID NOT IN
(
SELECT OrderID
FROM HeldOrders ho
)
will return nothing, which is probably not what you'd expect.
However, this one:
SELECT *
FROM Orders o
WHERE NOT EXISTS
(
SELECT NULL
FROM HeldOrders ho
WHERE ho.OrderID = o.OrderID
)
will return the row with OrderID = 1
.
Note that LEFT JOIN
solutions proposed by others is far from being a most efficient solution.
This query:
SELECT *
FROM Orders o
LEFT JOIN
HeldOrders ho
ON ho.OrderID = o.OrderID
WHERE ho.OrderID IS NULL
will use a filter condition that will need to evaluate and filter out all matching rows which can be numerius
An ANTI JOIN
method used by both IN
and EXISTS
will just need to make sure that a record does not exists once per each row in Orders
, so it will eliminate all possible duplicates first:
NESTED LOOPS ANTI JOIN
and MERGE ANTI JOIN
will just skip the duplicates when evaluating HeldOrders
.HASH ANTI JOIN
will eliminate duplicates when building the hash table."Most efficient" is going to be different depending on tables sizes, indexes, and so on. In other words it's going to differ depending on the specific case you're using.
There are three ways I commonly use to accomplish what you want, depending on the situation.
1. Your example works fine if Orders.order_id is indexed, and HeldOrders is fairly small.
2. Another method is the "correlated subquery" which is a slight variation of what you have...
SELECT *
FROM Orders o
WHERE Orders.Order_ID not in (Select Order_ID
FROM HeldOrders h
where h.order_id = o.order_id)
Note the addition of the where clause. This tends to work better when HeldOrders has a large number of rows. Order_ID needs to be indexed in both tables.
3. Another method I use sometimes is left outer join...
SELECT *
FROM Orders o
left outer join HeldOrders h on h.order_id = o.order_id
where h.order_id is null
When using the left outer join, h.order_id will have a value in it matching o.order_id when there is a matching row. If there isn't a matching row, h.order_id will be NULL. By checking for the NULL values in the where clause you can filter on everything that doesn't have a match.
Each of these variations can work more or less efficiently in various scenarios.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With