Oracle top-n query sort performance

Tags:

I'm new to sql optimization and I'm trying to understand why having more than 1 item in the IN clause results in large performance hit and how can I prevent it if it is even possible. Below is more or less what I'm working with. The second query is currently what I have and I'm looking to improve the performance. In real life TABLE_1 has millions of rows and the sorting part of the plan has a CPU cost of 21M.

SELECT 
    TOPNWRAPPER.*, 
    TABLE_2.X, 
    TABLE_2.Y 
FROM 
    TABLE_2, 
    ( 
        SELECT 
            * 
        FROM 
            ( 
                SELECT 
                    /*+ index (TABLE_1 TABLE_1_B_E_F_ID) */ 
                    TABLE_1.ID, 
                    TABLE_1.C, 
                    TABLE_1.B, 
                    TABLE_1.E, 
                    TABLE_1.F
                FROM 
                    TABLE_1 
                WHERE 
                    ( TABLE_1.F IN ( ‘STATE1’ ) ) AND 
                    ( TABLE_1.B= 'SOMETEXT' ) AND 
                    ( TABLE_1.C=1 ) AND 
                    ( TABLE_1.E= 'IN' ) AND 
                    ( TABLE_1.D IS NULL ) 
                ORDER BY 
                    TABLE_1.ID 
            ) 
        WHERE 
            ( ROWNUM <= 150 ) 
    ) TOPNWRAPPER 
WHERE 
    ( TOPNWRAPPER.ID = TABLE_2.T1_ID_FK ) 
ORDER BY 
    TOPNWRAPPER.ID ASC

Stats:

|--------------------------------------------------------------------------------------------------------------------------|
|| Id  | Operation                        | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers ||
|--------------------------------------------------------------------------------------------------------------------------|
||   0 ||SELECT STATEMENT                 |                             |      1 |        |    120 |00:00:00.01 |     965 ||
||   1 |||NESTED LOOPS                    |                             |      1 |        |    120 |00:00:00.01 |     965 ||
||   2 ||||NESTED LOOPS                   |                             |      1 |      1 |    120 |00:00:00.01 |     845 ||
||   3 |||||VIEW                          |                             |      1 |      1 |    120 |00:00:00.01 |     245 ||
||*  4 ||||||COUNT STOPKEY                |                             |      1 |        |    120 |00:00:00.01 |     245 ||
||   5 |||||||VIEW                        |                             |      1 |      1 |    120 |00:00:00.01 |     245 ||
||*  6 ||||||||TABLE ACCESS BY INDEX ROWID| TABLE_1                     |      1 |      1 |    120 |00:00:00.01 |     245 ||
||*  7 |||||||||INDEX RANGE SCAN          | TABLE_1_B_E_F_ID            |      1 |     25 |    120 |00:00:00.01 |     125 ||
||*  8 |||||INDEX RANGE SCAN              | TABLE_2_T1_ID_FK            |    120 |      1 |    120 |00:00:00.01 |     600 ||
||   9 ||||TABLE ACCESS BY INDEX ROWID    | TABLE_2                     |    120 |      1 |    120 |00:00:00.01 |     120 ||
|--------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                          |
|Predicate Information (identified by operation id):                                                                       |
|---------------------------------------------------                                                                       |
|                                                                                                                          |
|   4 - filter(ROWNUM<=150)                                                                                                |
|   6 - filter((“TABLE_1”.”C”=1 AND “TABLE_1”.”D” IS NULL))                                                                |
|   7 - access(“TABLE_1”.”B”='SOMETEXT' AND                                                                                |
|              “TABLE_1”.”E”=‘IN' AND “TABLE_1”.”F”=’STATE1’)                                                              |
|   8 - access(“TOPNWRAPPER”.”ID”=“TABLE_2”.”T1_ID_FK”)                                                                         |
+--------------------------------------------------------------------------------------------------------------------------+

When I update the query to have 'STATE2' in the IN clause an additional sort step is added to the plan.

SELECT 
    TOPNWRAPPER.*, 
    TABLE_2.X, 
    TABLE_2.Y 
FROM 
    TABLE_2, 
    ( 
        SELECT 
            * 
        FROM 
            ( 
                SELECT 
                    /*+ index (TABLE_1 TABLE_1_B_E_F_ID) */ 
                    TABLE_1.ID, 
                    TABLE_1.C, 
                    TABLE_1.B, 
                    TABLE_1.E, 
                    TABLE_1.F
                FROM 
                    TABLE_1 
                WHERE 
                    ( TABLE_1.F IN ( 'STATE1', 'STATE2' ) ) AND 
                    ( TABLE_1.B= 'SOMETEXT' ) AND 
                    ( TABLE_1.C=1 ) AND 
                    ( TABLE_1.E= 'IN' ) AND 
                    ( TABLE_1.D IS NULL ) 
                ORDER BY 
                    TABLE_1.ID 
            ) 
        WHERE 
            ( ROWNUM <= 150 ) 
    ) TOPNWRAPPER 
WHERE 
    ( TOPNWRAPPER.ID = TABLE_2.T1_ID_FK ) 
ORDER BY 
    TOPNWRAPPER.ID ASC

Stats:

|-------------------------------------------------------------------------------------------------------------------------------------------------------|
|| Id  | Operation                          | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem ||
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
||   0 ||SELECT STATEMENT                   |                             |      1 |        |    150 |00:00:00.01 |    1076 |       |       |          ||
||   1 |||NESTED LOOPS                      |                             |      1 |        |    150 |00:00:00.01 |    1076 |       |       |          ||
||   2 ||||NESTED LOOPS                     |                             |      1 |      1 |    150 |00:00:00.01 |     926 |       |       |          ||
||   3 |||||VIEW                            |                             |      1 |      1 |    150 |00:00:00.01 |     176 |       |       |          ||
||*  4 ||||||COUNT STOPKEY                  |                             |      1 |        |    150 |00:00:00.01 |     176 |       |       |          ||
||   5 |||||||VIEW                          |                             |      1 |      1 |    150 |00:00:00.01 |     176 |       |       |          ||
||*  6 ||||||||SORT ORDER BY STOPKEY        |                             |      1 |      1 |    150 |00:00:00.01 |     176 | 15360 | 15360 |14336  (0)||
||   7 |||||||||INLIST ITERATOR             |                             |      1 |        |    165 |00:00:00.01 |     176 |       |       |          ||
||*  8 ||||||||||TABLE ACCESS BY INDEX ROWID| TABLE_1                     |      2 |      1 |    165 |00:00:00.01 |     176 |       |       |          ||
||*  9 |||||||||||INDEX RANGE SCAN          | TABLE_1_B_E_F_ID            |      2 |     50 |    165 |00:00:00.01 |      11 |       |       |          ||
||* 10 |||||INDEX RANGE SCAN                | TABLE_2_T1_ID_FK            |    150 |      1 |    150 |00:00:00.01 |     750 |       |       |          ||
||  11 ||||TABLE ACCESS BY INDEX ROWID      | TABLE_2                     |    150 |      1 |    150 |00:00:00.01 |     150 |       |       |          ||
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                                                       |
|Predicate Information (identified by operation id):                                                                                                    |
|---------------------------------------------------                                                                                                    |
|                                                                                                                                                       |
|   4 - filter(ROWNUM<=150)                                                                                                                             |
|   6 - filter(ROWNUM<=150)                                                                                                                             |
|   8 - filter((“TABLE_1”.”C”=1 AND “TABLE_1”.”D” IS NULL))                                         |
|   9 - access(“TABLE_1”.”B”='SOMETEXT' AND                                                                                |
|              “TABLE_1”.”E”='IN' AND ((“TABLE_1”.”F”='STATE1') OR (“TABLE_1”.”F”='STATE2'))                                              |
|  10 - access(“TOPNWRAPPER”.”ID”=“TABLE_2”.”T1_ID_FK”)                                                              |
|                                                                                                                                                       |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+

I've been looking around for a few days now. One suggestion I tried was using the hint /*+ USE_CONCAT (OR_PREDICATES(1)) */ and that helps a bit by reducing the mem usage by half, but it did not completely eliminate the problem.

EDIT: Was looking around (http://use-the-index-luke.com/sql/sorting-grouping/indexed-order-by#tip-ixord-full) and think this might be due to the order by. If I change the order by statements to be: TABLE_1.F,TABLE_1.ID and TOPNWRAPPER.F,TOPNWRAPPER.ID ASC then the sort operation goes away, unfortunately I need the top n rows based on ID. Alternatively I tried making a new index on (ID F) to test with and it did also remove the sort operation, but ID is unique per row making the table access operations less effective.

EDIT 2:

OPERATION      |OPTION           |CPU COST
--------------------------------------------
SORT           |ORDER BY STOPKEY |21042774
|NESTED LOOPS  |OUTER            |56052
||TABLE ACCESS |BY INDEX ROWID   |38980
|||INDEX       |RANGE SCAN       |30086

927

asked Aug 25 '16 02:08

Lex

1 Answers

The performance difference probably doesn't matter. The execution plan difference is because multi-column index accesses are only implicitly sorted if the leading columns use equality conditions.

Performance Difference

Don't worry too much about the cost of an execution plan. Even though it's called the "Cost Based Optimizer", the cost is a weird number that only a few people in the world fully understand.

One reason why it is complicated to compare explain plan costs is that the total cost is sometimes less than one of the child operation costs. As I explain in my answer here, this can happen with a COUNT STOPKEY operation. This is Oracle's way of saying "this child operation would cost this huge amount, but the COUNT STOPKEY will probably cut it off before it gets that high". It's usually best to compare the top cost of the plan, but even that number can sometimes be misleading, as other examples in that answer demonstrate.

Which means that normally the run time is the only thing that matters. If the A-Time (actual time) is only 0.1 seconds for both queries then your job is probably done here.

Execution Plan Difference

The difference in the execution plans are caused by the ways multi-column indexes are stored and accessed. Sometimes when an index is scanned the results will be automatically stored and sometimes they will not. That's why one plan has COUNT STOPKEY and the other has the more expensive SORT ORDER BY STOPKEY.

To demonstrate this plan difference, create a simple table and index with just 2 columns and 4 rows:

create table test1 as 
select 1 a, 10 b from dual union all
select 1 a, 30 b from dual union all
select 2 a, 20 b from dual union all
select 2 a, 40 b from dual;

create index test1_idx on test1(a, b);

begin
    dbms_stats.gather_table_stats(user, 'TEST1');
end;
/

Below is a simplified idea of how the index is stored. The data is stored ordered by the leading column first, then ordered by the trailing column.

               +----+
        +------+Root+-------+
        |      +----+       |
        |                   |
      +-v-+               +-v-+
   +--+A=1+--+         +--+A=2+--+
   |  +---+  |         |  +---+  |
   |         |         |         |
 +-v--+   +--v-+     +-v--+   +--v-+
 |B=10|   |B=30|     |B=20|   |B=40|
 +----+   +----+     +----+   +----+

If the query only accesses one value in the leading column A, then it can read the values from column B in order without any extra effort. Oracle goes to one of the A blocks and then reads the B blocks in order without even trying.

Note how this query has an ORDER BY but there's no SORT in the execution plan.

explain plan for select * from test1 where a = 1 and b > 0 order by b;
select * from table(dbms_xplan.display(format => 'basic'));

Plan hash value: 598212486

--------------------------------------
| Id  | Operation        | Name      |
--------------------------------------
|   0 | SELECT STATEMENT |           |
|   1 |  INDEX RANGE SCAN| TEST1_IDX |
--------------------------------------

But if the query accesses more than one value in the leading column A, the results from B will not necessarily be retrieved in order. Oracle may read the A blocks in order, but the B block order is only true for one A value.

Now, an extra SORT ORDER BY operation shows up in the execution plan.

explain plan for select * from test1 where a in (1,2) and b > 0 order by b;
select * from table(dbms_xplan.display(format => 'basic'));

Plan hash value: 704117715

----------------------------------------
| Id  | Operation          | Name      |
----------------------------------------
|   0 | SELECT STATEMENT   |           |
|   1 |  SORT ORDER BY     |           |
|   2 |   INLIST ITERATOR  |           |
|   3 |    INDEX RANGE SCAN| TEST1_IDX |
----------------------------------------

This is why changing column1 = value1 to column1 in (value1, value2) may add an extra SORT operation.

answered Oct 16 '22 03:10

Jon Heller

Related questions
                            
                                How to make a UNION with Doctrine?
                            
                                Get the top n results per group [duplicate]
                            
                                How can I make this query sargable?
                            
                                EntityFramework package version="6.1.3" and Web config version 6.0.0.0?
                            
                                MySQL grouping isn't respecting ORDER BY
                            
                                Combine query that relies on resultset of another
                            
                                @SqlResultSetMapping columns : entities with sub-entities
                            
                                SQL max() function with a where clause and group by does not use the index efficiently
                            
                                Connecting tables when querying in MySQL
                            
                                Can a WHERE clause predicate evaluate to NULL?
                            
                                Unescape a string with escaped newlines and carriage returns
                            
                                SQL Regex - Replace with substring from another field
                            
                                Rails - Order by the average of an association
                            
                                Nicer way to insert on multiple tables, multiple retuned values from SQL WITH statement
                            
                                How to find out when a column was changed in Oracle DB?
                            
                                pandas read_sql return query string with arguments passed
                            
                                UNION of non-nullable columns is nullable
                            
                                SQL Server 2008 Merge Statement Multiple Match Conditions
                            
                                Materialized view fast refresh with HAVING clause?
                            
                                How can you auto-increment a non primary ID column scoped to another table?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Oracle top-n query sort performance

Tags:

sql

oracle

query-optimization

oracle12c

Lex

People also ask

1 Answers

Jon Heller

Recent Activity

Donate For Us