Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to tune a range / interval query in Oracle?

I have a table A with intervals (COL1, COL2):

CREATE TABLE A (
  COL1 NUMBER(15) NOT NULL,
  COL2 NUMBER(15) NOT NULL,
  VAL1 ...,
  VAL2 ...
);
ALTER TABLE A ADD CONSTRAINT COL1_BEFORE_COL2 CHECK (COL1 <= COL2);

The intervals are guaranteed to be "exclusive", i.e. they will never overlap. In other words, this query yields no rows:

SELECT *
FROM (
  SELECT
    LEAD(COL1, 1) OVER (ORDER BY COL1) NEXT,
    COL2
  FROM A
)
WHERE COL2 >= NEXT;

There is currently an index on (COL1, COL2). Now, my query is the following:

SELECT /*+FIRST_ROWS(1)*/ *
FROM A
WHERE :some_value BETWEEN COL1 AND COL2
AND ROWNUM = 1

This performs well (less than a ms for millions of records in A) for low values of :some_value, because they're very selective on the index. But it performs quite badly (almost a second) for high values of :some_value because of a lower selectivity of the access predicate.

The execution plan seems good to me. As the existing index already fully covers the predicate, I get the expected INDEX RANGE SCAN:

------------------------------------------------------
| Id  | Operation                    | Name | E-Rows |
------------------------------------------------------
|   0 | SELECT STATEMENT             |      |        |
|*  1 |  COUNT STOPKEY               |      |        |
|   2 |   TABLE ACCESS BY INDEX ROWID| A    |      1 |
|*  3 |    INDEX RANGE SCAN          | A_PK |        |
------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   3 - access("VAL2">=:some_value AND "VAL1"<=:some_value)
       filter("VAL2">=:some_value)

In 3, it becomes obvious that the access predicate is selective only for low values of :some_value whereas for higher values, the filter operation "kicks in" on the index.

Is there any way to generally improve this query to be fast regardless of the value of :some_value? I can completely redesign the table if further normalisation is needed.

like image 770
Lukas Eder Avatar asked Mar 06 '14 14:03

Lukas Eder


2 Answers

Your attempt is good, but misses a few crucial issues.

Let's start slowly. I'm assuming an index on COL1 and I actually don't mind if COL2 is included there as well.

Due to the constraints you have on your data (especially non-overlapping) you actually just want the row before the row where COL1 is <= some value....[--take a break--] it you order by COL1

This is a classic Top-N query:

select *
  FROM ( select *
          from A
         where col1 <= :some_value
         order by col1 desc
       )
 where rownum <= 1;

Please note that you must use ORDER BY to get a definite sort order. As WHERE is applied after ORDER BY you must now also wrap the top-n filter in an outer query.

That's almost done, the only reason why we actually need to filter on COL2 too is to filter out records that don't fall into the range at all. E.g. if some_value is 5 and you are having this data:

  COL1 | COL2
     1 |  2
     3 |  4   <-- you get this row 
     6 | 10

This row would be correct as result, if COL2 would be 5, but unfortunately, in this case the correct result of your query is [empty set]. That's the only reason we need to filter for COL2 like this:

select *
  FROM ( select *
           FROM ( select *
                    from A
                   where col1 <= :some_value
                   order by col1 desc
                )
          where rownum <= 1
        )
  WHERE col2 >= :some_value;

Your approach had several problems:

  • missing ORDER BY - dangerous in connection with rownum filter!
  • applying the Top-N clause (rownum filter) too early. What if there is no result? Database reads index until the end, the rownum (STOPKEY) never kicks in.
  • An optimizer glitch. With the between predicate, my 11g installation doesn't come to the idea to read the index in descending order, so it was actually reading it from the beginning (0) upwards until it found a COL2 value that matched --OR-- the COL1 run out of the range.

.

COL1 | COL2
   1 |  2   ^
   3 |  4   |      (2) go up until first match.
            +----- your intention was to start here
   6 | 10

What was actually happening was:

  COL1 | COL2
     1 |  2   +----- start at the beginning of the index
     3 |  4   |      Go down until first match.      
              V
     6 | 10

Look at the execution plan of my query:

------------------------------------------------------------------------------------------
| Id  | Operation                       | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |        |     1 |    26 |     4   (0)| 00:00:01 |
|*  1 |  VIEW                           |        |     1 |    26 |     4   (0)| 00:00:01 |
|*  2 |   COUNT STOPKEY                 |        |       |       |            |          |
|   3 |    VIEW                         |        |     2 |    52 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID | A      | 50000 |   585K|     4   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN DESCENDING| SIMPLE |     2 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Note the INDEX RANGE SCAN **DESCENDING**.

Finally, why didn't I include COL2 in the index? It's a single row-top-n query. You can save at most a single table access (irrespective of what the Rows estimation above says!) If you expect to find a row in most cases, you'll need to go to the table anyways for the other columns (probably) so you would not save ANYTHING, just consume space. Including the COL2 will only improve performance if you query doesn't return anything at all!

Related:

  • How to use index efficienty in mysql query I answered a very similar question about this years ago. Same solution.
  • Use The Index, Lukas!
like image 102
Markus Winand Avatar answered Oct 21 '22 09:10

Markus Winand


I think, because the ranges do not intersect, you can define col1 as primary key and execute the query like this:

SELECT *
  FROM    a
       JOIN
          (SELECT MAX (col1) AS col1
             FROM a
            WHERE col1 <= :somevalue) b
       ON a.col1 = b.col1;

If there are gaps between the ranges you wil have to add:

Where col2 >= :somevalue

as last line.

Execution Plan:

SELECT STATEMENT  
 NESTED LOOPS  
  VIEW  
   SORT AGGREGATE 
    FIRST ROW  
     INDEX RANGE SCAN (MIN/MAX) PKU1
  TABLE ACCESS BY INDEX A
   INDEX UNIQUE SCAN PKU1
like image 3
Thomas Köhne Avatar answered Oct 21 '22 10:10

Thomas Köhne