Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimising a SELECT query that runs slow on Oracle which runs quickly on SQL Server

I'm trying to run the following SQL statement in Oracle, and it takes ages to run:

SELECT orderID FROM tasks WHERE orderID NOT IN 
(SELECT DISTINCT orderID FROM tasks WHERE
 engineer1 IS NOT NULL AND engineer2 IS NOT NULL)

If I run just the sub-part that is in the IN clause, that runs very quickly in Oracle, i.e.

SELECT DISTINCT orderID FROM tasks WHERE
engineer1 IS NOT NULL AND engineer2 IS NOT NULL

Why does the whole statement take such a long time in Oracle? In SQL Server the whole statement runs quickly.

Alternatively is there a simpler/different/better SQL statement I should use?

Some more details about the problem:

  • Each order is made of many tasks
  • Each order will be allocated (one or more of its task will have engineer1 and engineer2 set) or the order can be unallocated (all its task have null values for the engineer fields)
  • I am trying to find all the orderIDs that are unallocated.

Just in case it makes any difference, there are ~120k rows in the table, and 3 tasks per order, so ~40k different orders.

Responses to answers:

  • I would prefer a SQL statement that works in both SQL Server and Oracle.
  • The tasks only has an index on the orderID and taskID.
  • I tried the NOT EXISTS version of the statement but it ran for over 3 minutes before I cancelled it. Perhaps need a JOIN version of the statement?
  • There is an "orders" table as well with the orderID column. But I was trying to simplify the question by not including it in the original SQL statement.

I guess that in the original SQL statement the sub-query is run every time for each row in the first part of the SQL statement - even though it is static and should only need to be run once?

Executing

ANALYZE TABLE tasks COMPUTE STATISTICS;

made my original SQL statement execute much faster.

Although I'm still curious why I have to do this, and if/when I would need to run it again?

The statistics give Oracle's cost-based optimzer information that it needs to determine the efficiency of different execution plans: for example, the number of rowsin a table, the average width of rows, highest and lowest values per column, number of distinct values per column, clustering factor of indexes etc.

In a small database you can just setup a job to gather statistics every night and leave it alone. In fact, this is the default under 10g. For larger implementations you usually have to weigh the stability of the execution plans against the way that the data changes, which is a tricky balance.

Oracle also has a feature called "dynamic sampling" that is used to sample tables to determine relevant statistics at execution time. It's much more often used with data warehouses where the overhead of the sampling it outweighed by the potential performance increase for a long-running query.

like image 844
RickL Avatar asked Sep 23 '08 11:09

RickL


2 Answers

Often this type of problem goes away if you analyze the tables involved (so Oracle has a better idea of the distribution of the data)

ANALYZE TABLE tasks COMPUTE STATISTICS;
like image 92
hamishmcn Avatar answered Sep 19 '22 12:09

hamishmcn


The "IN" - clause is known in Oracle to be pretty slow. In fact, the internal query optimizer in Oracle cannot handle statements with "IN" pretty good. try using "EXISTS":

SELECT orderID FROM tasks WHERE orderID NOT EXISTS 
    (SELECT DISTINCT orderID FROM tasks WHERE
         engineer1 IS NOT NULL AND engineer2 IS NOT NULL)`print("code sample");`

Caution: Please check if the query builds the same data results.

Edith says: ooops, the query is not well formed, but the general idea is correct. Oracle has to fulfill a full table scan for the second (inner) query, build the results and then compare them to the first (outer) query, that's why it's slowing down. Try

SELECT orderID AS oid FROM tasks WHERE NOT EXISTS 
    (SELECT DISTINCT orderID AS oid2 FROM tasks WHERE
         engineer1 IS NOT NULL AND engineer2 IS NOT NULL and oid=oid2)

or something similiar ;-)

like image 38
Georgi Avatar answered Sep 20 '22 12:09

Georgi