Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More Efficient Way to Grab the Latest Value based off a composite index?

Tags:

sql

oracle

I have a table called SAMPLE_TABLE that has the following columns, with CAR_TYPE, COLOR and CAR_BRAND making up the composite index.

 VALUE_ID      VALUE      CAR_TYPE     COLOR     SUBMIT_DT      CAR_BRAND
   1            10        Sedan        Blue      3/7/2019       Ford
   2            70        Sedan        Blue      3/6/2019       Ford
   3            20        Sedan        Blue      3/5/2019       Ford
   4            77         SUV         Red       3/7/2019       Volvo
   5           100         SUV         Red       3/1/2019       Volvo

Is there a way I can write a more efficient way to query for the Value correlated to the LATEST SUBMIT_DT? In the future, the TABLE will have millions of rows of data, so I will need to find a query with the lowest run-time/cost that can query.

For example, below is what I would want in my result set when querying for a Blue Ford Sedan:

 VALUE
 10

Below is what I have thus far:

 SELECT value 
   FROM (
          SELECT *
            FROM TABLE
           WHERE CAR_TYPE = rCar_Type
             AND COLOR = rColor
             AND CAR_BRAND = rCar_Brand
            ORDER by submit_dt desc
        )
   WHERE rownum = 1;

Is this inefficient?

Thanks in advance

like image 308
John Wick Avatar asked Dec 08 '22 12:12

John Wick


2 Answers

Wow... there are a lot of answers already, but I think some of them missed what I think is the point of your question.

You are going to have millions rows in your table and your composite index on (CAR_TYPE, COLOR, CAR_BRAND) will not be very selective. You are looking for a way to get the one row having the last SUBMIT_DT for a given entry in your composite index without having to read through ALL of the matches from that index.

Answer: add SUBMIT_DT DESC to your composite index

Let's set up a test:

create table matt_objects as select * from dba_objects;

-- This is our analog of your composite index
create index matt_objects_n1 on matt_objects ( object_type, owner );

exec dbms_stats.gather_table_stats(user,'MATT_OBJECTS');

Now, let's autotrace this statement:

select object_name
from   matt_objects
where  object_type = 'TABLE'
and    owner = 'INV'
order by last_ddl_time desc
fetch first 1 row only;
---------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                 |     1 |    88 |    17   (6)| 00:00:01 |
|*  1 |  VIEW                                 |                 |     1 |    88 |    17   (6)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK             |                 |   162 |  7290 |    17   (6)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| MATT_OBJECTS    |   162 |  7290 |    16   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | MATT_OBJECTS_N1 |   162 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("LAST_DDL_TIME") DESC )<=1)
   4 - access("OBJECT_TYPE"='TABLE' AND "OWNER"='INV')

Result (from autotrace): 72 consistent read buffer gets

Now, let's replace your composite index with one that will help us out more:

drop index matt_objects_n1;

create index matt_objects_n1 on matt_objects ( object_type, owner, last_ddl_time desc );

exec dbms_stats.gather_table_stats(user,'MATT_OBJECTS');

.. and let's autotrace that same statement again:

--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                 |     1 |    88 |    54   (2)| 00:00:01 |
|   1 |  SORT ORDER BY                 |                 |     1 |    88 |    54   (2)| 00:00:01 |
|*  2 |   VIEW                         |                 |     1 |    88 |    53   (0)| 00:00:01 |
|*  3 |    WINDOW NOSORT STOPKEY       |                 |   162 |  7290 |    53   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID| MATT_OBJECTS    |   162 |  7290 |    53   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN          | MATT_OBJECTS_N1 |   162 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1)
   3 - filter(ROW_NUMBER() OVER ( ORDER BY SYS_OP_DESCEND("LAST_DDL_TIME"))<=1)
   5 - access("OBJECT_TYPE"='TABLE' AND "OWNER"='INV')

Result (from autotrace): 5 consistent read gets

That index helped a lot. Notice the plan is different? 'WINDOW SORT PUSHED RANK' has been replaced by 'WINDOW NOSORT STOPKEY'. With the index already sorted the way you want (in descending order), Oracle knows it can read the index rows in order and stop after the first one -- completing the query with much less effort.

It's interesting to note that the cost of the 2nd query is higher than the cost of the 1st query, even though the performance of the 2nd query is more than 10 times better. It just goes to show you that "cost" is an estimate and should taken with a grain of salt sometimes.

like image 133
Matthew McPeak Avatar answered Jan 19 '23 00:01

Matthew McPeak


Well, query you wrote can't exactly be called "inefficient" but "useless" in this context as it'll return one, random row. You're probably missing ORDER BY in a subquery.

Anyway: see how this behaves:

select value
  from (select row_number() over (partition by car_type, color, car_brand
                                  order by submit_dt desc) rn,
               value
        from sample_table
        where car_type = rcar_type
          and color = rcolor
          and car_brand = rcar_brand
       )
where rn = 1;       

Don't forget to create index on columns used in the WHERE clause.

like image 32
Littlefoot Avatar answered Jan 19 '23 01:01

Littlefoot