Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle top-n query sort performance

I'm new to sql optimization and I'm trying to understand why having more than 1 item in the IN clause results in large performance hit and how can I prevent it if it is even possible. Below is more or less what I'm working with. The second query is currently what I have and I'm looking to improve the performance. In real life TABLE_1 has millions of rows and the sorting part of the plan has a CPU cost of 21M.

SELECT 
    TOPNWRAPPER.*, 
    TABLE_2.X, 
    TABLE_2.Y 
FROM 
    TABLE_2, 
    ( 
        SELECT 
            * 
        FROM 
            ( 
                SELECT 
                    /*+ index (TABLE_1 TABLE_1_B_E_F_ID) */ 
                    TABLE_1.ID, 
                    TABLE_1.C, 
                    TABLE_1.B, 
                    TABLE_1.E, 
                    TABLE_1.F
                FROM 
                    TABLE_1 
                WHERE 
                    ( TABLE_1.F IN ( ‘STATE1’ ) ) AND 
                    ( TABLE_1.B= 'SOMETEXT' ) AND 
                    ( TABLE_1.C=1 ) AND 
                    ( TABLE_1.E= 'IN' ) AND 
                    ( TABLE_1.D IS NULL ) 
                ORDER BY 
                    TABLE_1.ID 
            ) 
        WHERE 
            ( ROWNUM <= 150 ) 
    ) TOPNWRAPPER 
WHERE 
    ( TOPNWRAPPER.ID = TABLE_2.T1_ID_FK ) 
ORDER BY 
    TOPNWRAPPER.ID ASC

Stats:

|--------------------------------------------------------------------------------------------------------------------------|
|| Id  | Operation                        | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers ||
|--------------------------------------------------------------------------------------------------------------------------|
||   0 ||SELECT STATEMENT                 |                             |      1 |        |    120 |00:00:00.01 |     965 ||
||   1 |||NESTED LOOPS                    |                             |      1 |        |    120 |00:00:00.01 |     965 ||
||   2 ||||NESTED LOOPS                   |                             |      1 |      1 |    120 |00:00:00.01 |     845 ||
||   3 |||||VIEW                          |                             |      1 |      1 |    120 |00:00:00.01 |     245 ||
||*  4 ||||||COUNT STOPKEY                |                             |      1 |        |    120 |00:00:00.01 |     245 ||
||   5 |||||||VIEW                        |                             |      1 |      1 |    120 |00:00:00.01 |     245 ||
||*  6 ||||||||TABLE ACCESS BY INDEX ROWID| TABLE_1                     |      1 |      1 |    120 |00:00:00.01 |     245 ||
||*  7 |||||||||INDEX RANGE SCAN          | TABLE_1_B_E_F_ID            |      1 |     25 |    120 |00:00:00.01 |     125 ||
||*  8 |||||INDEX RANGE SCAN              | TABLE_2_T1_ID_FK            |    120 |      1 |    120 |00:00:00.01 |     600 ||
||   9 ||||TABLE ACCESS BY INDEX ROWID    | TABLE_2                     |    120 |      1 |    120 |00:00:00.01 |     120 ||
|--------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                          |
|Predicate Information (identified by operation id):                                                                       |
|---------------------------------------------------                                                                       |
|                                                                                                                          |
|   4 - filter(ROWNUM<=150)                                                                                                |
|   6 - filter((“TABLE_1”.”C”=1 AND “TABLE_1”.”D” IS NULL))                                                                |
|   7 - access(“TABLE_1”.”B”='SOMETEXT' AND                                                                                |
|              “TABLE_1”.”E”=‘IN' AND “TABLE_1”.”F”=’STATE1’)                                                              |
|   8 - access(“TOPNWRAPPER”.”ID”=“TABLE_2”.”T1_ID_FK”)                                                                         |
+--------------------------------------------------------------------------------------------------------------------------+

When I update the query to have 'STATE2' in the IN clause an additional sort step is added to the plan.

SELECT 
    TOPNWRAPPER.*, 
    TABLE_2.X, 
    TABLE_2.Y 
FROM 
    TABLE_2, 
    ( 
        SELECT 
            * 
        FROM 
            ( 
                SELECT 
                    /*+ index (TABLE_1 TABLE_1_B_E_F_ID) */ 
                    TABLE_1.ID, 
                    TABLE_1.C, 
                    TABLE_1.B, 
                    TABLE_1.E, 
                    TABLE_1.F
                FROM 
                    TABLE_1 
                WHERE 
                    ( TABLE_1.F IN ( 'STATE1', 'STATE2' ) ) AND 
                    ( TABLE_1.B= 'SOMETEXT' ) AND 
                    ( TABLE_1.C=1 ) AND 
                    ( TABLE_1.E= 'IN' ) AND 
                    ( TABLE_1.D IS NULL ) 
                ORDER BY 
                    TABLE_1.ID 
            ) 
        WHERE 
            ( ROWNUM <= 150 ) 
    ) TOPNWRAPPER 
WHERE 
    ( TOPNWRAPPER.ID = TABLE_2.T1_ID_FK ) 
ORDER BY 
    TOPNWRAPPER.ID ASC

Stats:

|-------------------------------------------------------------------------------------------------------------------------------------------------------|
|| Id  | Operation                          | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem ||
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
||   0 ||SELECT STATEMENT                   |                             |      1 |        |    150 |00:00:00.01 |    1076 |       |       |          ||
||   1 |||NESTED LOOPS                      |                             |      1 |        |    150 |00:00:00.01 |    1076 |       |       |          ||
||   2 ||||NESTED LOOPS                     |                             |      1 |      1 |    150 |00:00:00.01 |     926 |       |       |          ||
||   3 |||||VIEW                            |                             |      1 |      1 |    150 |00:00:00.01 |     176 |       |       |          ||
||*  4 ||||||COUNT STOPKEY                  |                             |      1 |        |    150 |00:00:00.01 |     176 |       |       |          ||
||   5 |||||||VIEW                          |                             |      1 |      1 |    150 |00:00:00.01 |     176 |       |       |          ||
||*  6 ||||||||SORT ORDER BY STOPKEY        |                             |      1 |      1 |    150 |00:00:00.01 |     176 | 15360 | 15360 |14336  (0)||
||   7 |||||||||INLIST ITERATOR             |                             |      1 |        |    165 |00:00:00.01 |     176 |       |       |          ||
||*  8 ||||||||||TABLE ACCESS BY INDEX ROWID| TABLE_1                     |      2 |      1 |    165 |00:00:00.01 |     176 |       |       |          ||
||*  9 |||||||||||INDEX RANGE SCAN          | TABLE_1_B_E_F_ID            |      2 |     50 |    165 |00:00:00.01 |      11 |       |       |          ||
||* 10 |||||INDEX RANGE SCAN                | TABLE_2_T1_ID_FK            |    150 |      1 |    150 |00:00:00.01 |     750 |       |       |          ||
||  11 ||||TABLE ACCESS BY INDEX ROWID      | TABLE_2                     |    150 |      1 |    150 |00:00:00.01 |     150 |       |       |          ||
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                                                       |
|Predicate Information (identified by operation id):                                                                                                    |
|---------------------------------------------------                                                                                                    |
|                                                                                                                                                       |
|   4 - filter(ROWNUM<=150)                                                                                                                             |
|   6 - filter(ROWNUM<=150)                                                                                                                             |
|   8 - filter((“TABLE_1”.”C”=1 AND “TABLE_1”.”D” IS NULL))                                         |
|   9 - access(“TABLE_1”.”B”='SOMETEXT' AND                                                                                |
|              “TABLE_1”.”E”='IN' AND ((“TABLE_1”.”F”='STATE1') OR (“TABLE_1”.”F”='STATE2'))                                              |
|  10 - access(“TOPNWRAPPER”.”ID”=“TABLE_2”.”T1_ID_FK”)                                                              |
|                                                                                                                                                       |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+

I've been looking around for a few days now. One suggestion I tried was using the hint /*+ USE_CONCAT (OR_PREDICATES(1)) */ and that helps a bit by reducing the mem usage by half, but it did not completely eliminate the problem.

EDIT: Was looking around (http://use-the-index-luke.com/sql/sorting-grouping/indexed-order-by#tip-ixord-full) and think this might be due to the order by. If I change the order by statements to be: TABLE_1.F,TABLE_1.ID and TOPNWRAPPER.F,TOPNWRAPPER.ID ASC then the sort operation goes away, unfortunately I need the top n rows based on ID. Alternatively I tried making a new index on (ID F) to test with and it did also remove the sort operation, but ID is unique per row making the table access operations less effective.

EDIT 2:

OPERATION      |OPTION           |CPU COST
--------------------------------------------
SORT           |ORDER BY STOPKEY |21042774
|NESTED LOOPS  |OUTER            |56052
||TABLE ACCESS |BY INDEX ROWID   |38980
|||INDEX       |RANGE SCAN       |30086
like image 927
Lex Avatar asked Aug 25 '16 02:08

Lex


People also ask

How can I improve my Oracle query performance?

Partitioning your data and creating local partitioned indexes can improve your query performance. On a partitioned table, each partition has its own set of index tables. Effectively, there are multiple indexes, but the results from each are combined as necessary to produce the final result set.

How do you pass more than 1000 values in clause?

You cannot have more than 1000 literals in an IN clause. You can, however, have SELECT statements in your IN clause which can return an unlimited number of elements i.e. You might try using 'between' clause replacing 'in'... check documentation for correct syntax on using between.

WHAT IS TOP N in Oracle?

Top-N queries provide a method for limiting the number of rows returned from ordered sets of data. They are extremely useful when you want to return the top or bottom "N" number of rows from a set or when you are paging through data.


1 Answers

The performance difference probably doesn't matter. The execution plan difference is because multi-column index accesses are only implicitly sorted if the leading columns use equality conditions.

Performance Difference

Don't worry too much about the cost of an execution plan. Even though it's called the "Cost Based Optimizer", the cost is a weird number that only a few people in the world fully understand.

One reason why it is complicated to compare explain plan costs is that the total cost is sometimes less than one of the child operation costs. As I explain in my answer here, this can happen with a COUNT STOPKEY operation. This is Oracle's way of saying "this child operation would cost this huge amount, but the COUNT STOPKEY will probably cut it off before it gets that high". It's usually best to compare the top cost of the plan, but even that number can sometimes be misleading, as other examples in that answer demonstrate.

Which means that normally the run time is the only thing that matters. If the A-Time (actual time) is only 0.1 seconds for both queries then your job is probably done here.

Execution Plan Difference

The difference in the execution plans are caused by the ways multi-column indexes are stored and accessed. Sometimes when an index is scanned the results will be automatically stored and sometimes they will not. That's why one plan has COUNT STOPKEY and the other has the more expensive SORT ORDER BY STOPKEY.

To demonstrate this plan difference, create a simple table and index with just 2 columns and 4 rows:

create table test1 as 
select 1 a, 10 b from dual union all
select 1 a, 30 b from dual union all
select 2 a, 20 b from dual union all
select 2 a, 40 b from dual;

create index test1_idx on test1(a, b);

begin
    dbms_stats.gather_table_stats(user, 'TEST1');
end;
/

Below is a simplified idea of how the index is stored. The data is stored ordered by the leading column first, then ordered by the trailing column.

               +----+
        +------+Root+-------+
        |      +----+       |
        |                   |
      +-v-+               +-v-+
   +--+A=1+--+         +--+A=2+--+
   |  +---+  |         |  +---+  |
   |         |         |         |
 +-v--+   +--v-+     +-v--+   +--v-+
 |B=10|   |B=30|     |B=20|   |B=40|
 +----+   +----+     +----+   +----+

If the query only accesses one value in the leading column A, then it can read the values from column B in order without any extra effort. Oracle goes to one of the A blocks and then reads the B blocks in order without even trying.

Note how this query has an ORDER BY but there's no SORT in the execution plan.

explain plan for select * from test1 where a = 1 and b > 0 order by b;
select * from table(dbms_xplan.display(format => 'basic'));

Plan hash value: 598212486

--------------------------------------
| Id  | Operation        | Name      |
--------------------------------------
|   0 | SELECT STATEMENT |           |
|   1 |  INDEX RANGE SCAN| TEST1_IDX |
--------------------------------------

But if the query accesses more than one value in the leading column A, the results from B will not necessarily be retrieved in order. Oracle may read the A blocks in order, but the B block order is only true for one A value.

Now, an extra SORT ORDER BY operation shows up in the execution plan.

explain plan for select * from test1 where a in (1,2) and b > 0 order by b;
select * from table(dbms_xplan.display(format => 'basic'));

Plan hash value: 704117715

----------------------------------------
| Id  | Operation          | Name      |
----------------------------------------
|   0 | SELECT STATEMENT   |           |
|   1 |  SORT ORDER BY     |           |
|   2 |   INLIST ITERATOR  |           |
|   3 |    INDEX RANGE SCAN| TEST1_IDX |
----------------------------------------

This is why changing column1 = value1 to column1 in (value1, value2) may add an extra SORT operation.

like image 76
Jon Heller Avatar answered Oct 16 '22 03:10

Jon Heller