I'm new to sql optimization and I'm trying to understand why having more than 1 item in the IN clause results in large performance hit and how can I prevent it if it is even possible. Below is more or less what I'm working with. The second query is currently what I have and I'm looking to improve the performance. In real life TABLE_1 has millions of rows and the sorting part of the plan has a CPU cost of 21M.
SELECT
TOPNWRAPPER.*,
TABLE_2.X,
TABLE_2.Y
FROM
TABLE_2,
(
SELECT
*
FROM
(
SELECT
/*+ index (TABLE_1 TABLE_1_B_E_F_ID) */
TABLE_1.ID,
TABLE_1.C,
TABLE_1.B,
TABLE_1.E,
TABLE_1.F
FROM
TABLE_1
WHERE
( TABLE_1.F IN ( ‘STATE1’ ) ) AND
( TABLE_1.B= 'SOMETEXT' ) AND
( TABLE_1.C=1 ) AND
( TABLE_1.E= 'IN' ) AND
( TABLE_1.D IS NULL )
ORDER BY
TABLE_1.ID
)
WHERE
( ROWNUM <= 150 )
) TOPNWRAPPER
WHERE
( TOPNWRAPPER.ID = TABLE_2.T1_ID_FK )
ORDER BY
TOPNWRAPPER.ID ASC
Stats:
|--------------------------------------------------------------------------------------------------------------------------|
|| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers ||
|--------------------------------------------------------------------------------------------------------------------------|
|| 0 ||SELECT STATEMENT | | 1 | | 120 |00:00:00.01 | 965 ||
|| 1 |||NESTED LOOPS | | 1 | | 120 |00:00:00.01 | 965 ||
|| 2 ||||NESTED LOOPS | | 1 | 1 | 120 |00:00:00.01 | 845 ||
|| 3 |||||VIEW | | 1 | 1 | 120 |00:00:00.01 | 245 ||
||* 4 ||||||COUNT STOPKEY | | 1 | | 120 |00:00:00.01 | 245 ||
|| 5 |||||||VIEW | | 1 | 1 | 120 |00:00:00.01 | 245 ||
||* 6 ||||||||TABLE ACCESS BY INDEX ROWID| TABLE_1 | 1 | 1 | 120 |00:00:00.01 | 245 ||
||* 7 |||||||||INDEX RANGE SCAN | TABLE_1_B_E_F_ID | 1 | 25 | 120 |00:00:00.01 | 125 ||
||* 8 |||||INDEX RANGE SCAN | TABLE_2_T1_ID_FK | 120 | 1 | 120 |00:00:00.01 | 600 ||
|| 9 ||||TABLE ACCESS BY INDEX ROWID | TABLE_2 | 120 | 1 | 120 |00:00:00.01 | 120 ||
|--------------------------------------------------------------------------------------------------------------------------|
| |
|Predicate Information (identified by operation id): |
|--------------------------------------------------- |
| |
| 4 - filter(ROWNUM<=150) |
| 6 - filter((“TABLE_1”.”C”=1 AND “TABLE_1”.”D” IS NULL)) |
| 7 - access(“TABLE_1”.”B”='SOMETEXT' AND |
| “TABLE_1”.”E”=‘IN' AND “TABLE_1”.”F”=’STATE1’) |
| 8 - access(“TOPNWRAPPER”.”ID”=“TABLE_2”.”T1_ID_FK”) |
+--------------------------------------------------------------------------------------------------------------------------+
When I update the query to have 'STATE2' in the IN clause an additional sort step is added to the plan.
SELECT
TOPNWRAPPER.*,
TABLE_2.X,
TABLE_2.Y
FROM
TABLE_2,
(
SELECT
*
FROM
(
SELECT
/*+ index (TABLE_1 TABLE_1_B_E_F_ID) */
TABLE_1.ID,
TABLE_1.C,
TABLE_1.B,
TABLE_1.E,
TABLE_1.F
FROM
TABLE_1
WHERE
( TABLE_1.F IN ( 'STATE1', 'STATE2' ) ) AND
( TABLE_1.B= 'SOMETEXT' ) AND
( TABLE_1.C=1 ) AND
( TABLE_1.E= 'IN' ) AND
( TABLE_1.D IS NULL )
ORDER BY
TABLE_1.ID
)
WHERE
( ROWNUM <= 150 )
) TOPNWRAPPER
WHERE
( TOPNWRAPPER.ID = TABLE_2.T1_ID_FK )
ORDER BY
TOPNWRAPPER.ID ASC
Stats:
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
|| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem ||
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
|| 0 ||SELECT STATEMENT | | 1 | | 150 |00:00:00.01 | 1076 | | | ||
|| 1 |||NESTED LOOPS | | 1 | | 150 |00:00:00.01 | 1076 | | | ||
|| 2 ||||NESTED LOOPS | | 1 | 1 | 150 |00:00:00.01 | 926 | | | ||
|| 3 |||||VIEW | | 1 | 1 | 150 |00:00:00.01 | 176 | | | ||
||* 4 ||||||COUNT STOPKEY | | 1 | | 150 |00:00:00.01 | 176 | | | ||
|| 5 |||||||VIEW | | 1 | 1 | 150 |00:00:00.01 | 176 | | | ||
||* 6 ||||||||SORT ORDER BY STOPKEY | | 1 | 1 | 150 |00:00:00.01 | 176 | 15360 | 15360 |14336 (0)||
|| 7 |||||||||INLIST ITERATOR | | 1 | | 165 |00:00:00.01 | 176 | | | ||
||* 8 ||||||||||TABLE ACCESS BY INDEX ROWID| TABLE_1 | 2 | 1 | 165 |00:00:00.01 | 176 | | | ||
||* 9 |||||||||||INDEX RANGE SCAN | TABLE_1_B_E_F_ID | 2 | 50 | 165 |00:00:00.01 | 11 | | | ||
||* 10 |||||INDEX RANGE SCAN | TABLE_2_T1_ID_FK | 150 | 1 | 150 |00:00:00.01 | 750 | | | ||
|| 11 ||||TABLE ACCESS BY INDEX ROWID | TABLE_2 | 150 | 1 | 150 |00:00:00.01 | 150 | | | ||
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
| |
|Predicate Information (identified by operation id): |
|--------------------------------------------------- |
| |
| 4 - filter(ROWNUM<=150) |
| 6 - filter(ROWNUM<=150) |
| 8 - filter((“TABLE_1”.”C”=1 AND “TABLE_1”.”D” IS NULL)) |
| 9 - access(“TABLE_1”.”B”='SOMETEXT' AND |
| “TABLE_1”.”E”='IN' AND ((“TABLE_1”.”F”='STATE1') OR (“TABLE_1”.”F”='STATE2')) |
| 10 - access(“TOPNWRAPPER”.”ID”=“TABLE_2”.”T1_ID_FK”) |
| |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
I've been looking around for a few days now. One suggestion I tried was using the hint /*+ USE_CONCAT (OR_PREDICATES(1)) */
and that helps a bit by reducing the mem usage by half, but it did not completely eliminate the problem.
EDIT: Was looking around (http://use-the-index-luke.com/sql/sorting-grouping/indexed-order-by#tip-ixord-full) and think this might be due to the order by. If I change the order by statements to be: TABLE_1.F,TABLE_1.ID
and TOPNWRAPPER.F,TOPNWRAPPER.ID ASC
then the sort operation goes away, unfortunately I need the top n rows based on ID. Alternatively I tried making a new index on (ID F) to test with and it did also remove the sort operation, but ID is unique per row making the table access operations less effective.
EDIT 2:
OPERATION |OPTION |CPU COST
--------------------------------------------
SORT |ORDER BY STOPKEY |21042774
|NESTED LOOPS |OUTER |56052
||TABLE ACCESS |BY INDEX ROWID |38980
|||INDEX |RANGE SCAN |30086
Partitioning your data and creating local partitioned indexes can improve your query performance. On a partitioned table, each partition has its own set of index tables. Effectively, there are multiple indexes, but the results from each are combined as necessary to produce the final result set.
You cannot have more than 1000 literals in an IN clause. You can, however, have SELECT statements in your IN clause which can return an unlimited number of elements i.e. You might try using 'between' clause replacing 'in'... check documentation for correct syntax on using between.
Top-N queries provide a method for limiting the number of rows returned from ordered sets of data. They are extremely useful when you want to return the top or bottom "N" number of rows from a set or when you are paging through data.
The performance difference probably doesn't matter. The execution plan difference is because multi-column index accesses are only implicitly sorted if the leading columns use equality conditions.
Performance Difference
Don't worry too much about the cost of an execution plan. Even though it's called the "Cost Based Optimizer", the cost is a weird number that only a few people in the world fully understand.
One reason why it is complicated to compare explain plan costs is that the total cost is sometimes less than one of the child operation costs. As I explain in my answer here, this can happen with a COUNT STOPKEY
operation. This is Oracle's way of saying "this child operation would cost this huge amount, but the COUNT STOPKEY will probably cut it off before it gets that high". It's usually best to compare the top cost of the plan, but even that number can sometimes be misleading, as other examples in that answer demonstrate.
Which means that normally the run time is the only thing that matters. If the A-Time (actual time) is only 0.1 seconds for both queries then your job is probably done here.
Execution Plan Difference
The difference in the execution plans are caused by the ways multi-column indexes are stored and accessed. Sometimes when an index is scanned the results will be automatically stored and sometimes they will not. That's why one plan has COUNT STOPKEY
and the other has the more expensive SORT ORDER BY STOPKEY
.
To demonstrate this plan difference, create a simple table and index with just 2 columns and 4 rows:
create table test1 as
select 1 a, 10 b from dual union all
select 1 a, 30 b from dual union all
select 2 a, 20 b from dual union all
select 2 a, 40 b from dual;
create index test1_idx on test1(a, b);
begin
dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
Below is a simplified idea of how the index is stored. The data is stored ordered by the leading column first, then ordered by the trailing column.
+----+
+------+Root+-------+
| +----+ |
| |
+-v-+ +-v-+
+--+A=1+--+ +--+A=2+--+
| +---+ | | +---+ |
| | | |
+-v--+ +--v-+ +-v--+ +--v-+
|B=10| |B=30| |B=20| |B=40|
+----+ +----+ +----+ +----+
If the query only accesses one value in the leading column A, then it can read the values from column B in order without any extra effort. Oracle goes to one of the A blocks and then reads the B blocks in order without even trying.
Note how this query has an ORDER BY
but there's no SORT
in the execution plan.
explain plan for select * from test1 where a = 1 and b > 0 order by b;
select * from table(dbms_xplan.display(format => 'basic'));
Plan hash value: 598212486
--------------------------------------
| Id | Operation | Name |
--------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | INDEX RANGE SCAN| TEST1_IDX |
--------------------------------------
But if the query accesses more than one value in the leading column A, the results from B will not necessarily be retrieved in order. Oracle may read the A blocks in order, but the B block order is only true for one A value.
Now, an extra SORT ORDER BY
operation shows up in the execution plan.
explain plan for select * from test1 where a in (1,2) and b > 0 order by b;
select * from table(dbms_xplan.display(format => 'basic'));
Plan hash value: 704117715
----------------------------------------
| Id | Operation | Name |
----------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT ORDER BY | |
| 2 | INLIST ITERATOR | |
| 3 | INDEX RANGE SCAN| TEST1_IDX |
----------------------------------------
This is why changing column1 = value1
to column1 in (value1, value2)
may add an extra SORT
operation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With