I'm trying to run the following SQL statement in Oracle, and it takes ages to run:
SELECT orderID FROM tasks WHERE orderID NOT IN
(SELECT DISTINCT orderID FROM tasks WHERE
engineer1 IS NOT NULL AND engineer2 IS NOT NULL)
If I run just the sub-part that is in the IN clause, that runs very quickly in Oracle, i.e.
SELECT DISTINCT orderID FROM tasks WHERE
engineer1 IS NOT NULL AND engineer2 IS NOT NULL
Why does the whole statement take such a long time in Oracle? In SQL Server the whole statement runs quickly.
Alternatively is there a simpler/different/better SQL statement I should use?
Some more details about the problem:
Just in case it makes any difference, there are ~120k rows in the table, and 3 tasks per order, so ~40k different orders.
Responses to answers:
I guess that in the original SQL statement the sub-query is run every time for each row in the first part of the SQL statement - even though it is static and should only need to be run once?
Executing
ANALYZE TABLE tasks COMPUTE STATISTICS;
made my original SQL statement execute much faster.
Although I'm still curious why I have to do this, and if/when I would need to run it again?
The statistics give Oracle's cost-based optimzer information that it needs to determine the efficiency of different execution plans: for example, the number of rowsin a table, the average width of rows, highest and lowest values per column, number of distinct values per column, clustering factor of indexes etc.
In a small database you can just setup a job to gather statistics every night and leave it alone. In fact, this is the default under 10g. For larger implementations you usually have to weigh the stability of the execution plans against the way that the data changes, which is a tricky balance.
Oracle also has a feature called "dynamic sampling" that is used to sample tables to determine relevant statistics at execution time. It's much more often used with data warehouses where the overhead of the sampling it outweighed by the potential performance increase for a long-running query.
Often this type of problem goes away if you analyze the tables involved (so Oracle has a better idea of the distribution of the data)
ANALYZE TABLE tasks COMPUTE STATISTICS;
The "IN" - clause is known in Oracle to be pretty slow. In fact, the internal query optimizer in Oracle cannot handle statements with "IN" pretty good. try using "EXISTS":
SELECT orderID FROM tasks WHERE orderID NOT EXISTS
(SELECT DISTINCT orderID FROM tasks WHERE
engineer1 IS NOT NULL AND engineer2 IS NOT NULL)`print("code sample");`
Caution: Please check if the query builds the same data results.
Edith says: ooops, the query is not well formed, but the general idea is correct. Oracle has to fulfill a full table scan for the second (inner) query, build the results and then compare them to the first (outer) query, that's why it's slowing down. Try
SELECT orderID AS oid FROM tasks WHERE NOT EXISTS
(SELECT DISTINCT orderID AS oid2 FROM tasks WHERE
engineer1 IS NOT NULL AND engineer2 IS NOT NULL and oid=oid2)
or something similiar ;-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With