Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join elimination not working in Oracle with sub queries

I am able to get join elimination to work for simple cases such as one-to-one relations, but not for slightly more complicated scenarios. Ultimately I want to try anchor modelling, but first I need to find a way around this problem. I'm using Oracle 12c Enterprise Edition Release 12.1.0.2.0.

DDL for my test case:

drop view product_5nf;
drop table product_color cascade constraints;
drop table product_price cascade constraints;
drop table product       cascade constraints;

create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk primary key(product_id, from_date)
);

Some example data:

insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);

insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');

insert into product_price values(1, date '2016-01-01', 10);
insert into product_price values(1, date '2016-02-01', 8);
insert into product_price values(1, date '2016-05-01', 5);

insert into product_price values(2, date '2016-02-01', 5);

insert into product_price values(4, date '2016-01-01', 10);

commit;

The 5NF view

This first view does not compile - it fails with ORA-01799: a column may not be outer-joined to a subquery. Unfortunately, this is how most of the historized views are defined when I'm looking at the online examples of anchor modelling...

create view product_5nf as
   select p.product_id
         ,pc.color
         ,pp.price 
     from product p
     left join product_color pc on(
          pc.product_id = p.product_id
     )
     left join product_price pp on(
          pp.product_id = p.product_id
      and pp.from_date  = (select max(pp2.from_date) 
                             from product_price pp2 
                            where pp2.product_id = pp.product_id)
     );

Below is my attempt at fixing it. When using this view with a simple select of product_id, Oracle manages to eliminate product_color but not product_price.

create view product_5nf as
   select product_id
         ,pc.color
         ,pp.price 
     from product p
     left join product_color pc using(product_id)
     left join (select pp1.product_id, pp1.price 
                  from product_price pp1
                 where pp1.from_date  = (select max(pp2.from_date) 
                                           from product_price pp2 
                                          where pp2.product_id = pp1.product_id)
              )pp using(product_id);

select product_id
  from product_5nf;

----------------------------------------------------------
| Id  | Operation             | Name             | Rows  |
----------------------------------------------------------
|   0 | SELECT STATEMENT      |                  |     4 |
|*  1 |  HASH JOIN OUTER      |                  |     4 |
|   2 |   INDEX FAST FULL SCAN| PRODUCT_PK       |     4 |
|   3 |   VIEW                |                  |     3 |
|   4 |    NESTED LOOPS       |                  |     3 |
|   5 |     VIEW              | VW_SQ_1          |     5 |
|   6 |      HASH GROUP BY    |                  |     5 |
|   7 |       INDEX FULL SCAN | PRODUCT_PRICE_PK |     5 |
|*  8 |     INDEX UNIQUE SCAN | PRODUCT_PRICE_PK |     1 |
----------------------------------------------------------

The only solution I have found is to use scalar sub queries instead, like this:

create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,(select pp.price
             from product_price pp
            where pp.product_id = p.product_id
              and pp.from_date = (select max(from_date)
                                    from product_price pp2
                                   where pp2.product_id = pp.product_id)) as price
     from product p
     left join product_color pc on(
          pc.product_id = p.product_id
     )

select product_id
  from product_5nf;

---------------------------------------------------
| Id  | Operation            | Name       | Rows  |
---------------------------------------------------
|   0 | SELECT STATEMENT     |            |     4 |
|   1 |  INDEX FAST FULL SCAN| PRODUCT_PK |     4 |
---------------------------------------------------

Now Oracle sucessfully eliminates the product_price table. However, scalar sub queries are implemented differently than joins and they way they are executed simply doesn't allow me to get any acceptable performance in a real world scenario.

TL;DR How can I rewrite the view product_5nf so that Oracle sucessfully eliminates both of the dependent tables?

like image 842
Ronnis Avatar asked Nov 08 '16 15:11

Ronnis


People also ask

Can we use joins in subquery?

Use a join or a subquery anytime that you reference information from multiple tables. Joins and subqueries are often used together in the same query. In many cases, you can solve a data retrieval problem by using a join, a subquery, or both.

Is it better to use subqueries instead of join?

The optimizer is more mature for MYSQL for joins than for subqueries, so in many cases a statement that uses a subquery can be executed more efficiently if you rewrite it as join. We cannot modify a table and select from the same table within a subquery in the same SQL statement.

Is it possible to join two or more tables data using subqueries?

It is possible, and indeed common, to join more than just two tables together. This is done by adding additional JOIN clauses to your SELECT statement.

Is subquery faster than join?

A general rule is that joins are faster in most cases (99%). The more data tables have, the subqueries are slower. The less data tables have, the subqueries have equivalent speed as joins. The subqueries are simpler, easier to understand, and easier to read.


2 Answers

I think you have two problems going on here.

First, join elimination only works for certain, specific situations (PK-PK, PK-FK, etc). It is not a general thing where you can LEFT JOIN to any row set that will return a single row for each join key value and have Oracle eliminate the join.

Second, even if Oracle were advanced enough to do join elimination on ANY LEFT JOIN where it knew it would get only one row per join key value, Oracle does not yet support join eliminations on LEFT JOINS that are based on a composite key (Oracle support document 887553.1 says this is coming in R12.2).

One workaround you could consider is materializing a view with the last row for each product_id. Then LEFT JOIN to the materialized view. Like this:

create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk  primary key (product_id, from_date )
);

-- Add a VIRTUAL column to PRODUCT_PRICE so that we can get all the data for 
-- the latest row by taking the MAX() of this column.
alter table product_price add ( sortable_row varchar2(80) generated always as ( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0'))  virtual not null );

-- Create a MV snapshot so we can materialize a view having only the latest
-- row for each product_id and can refresh that MV fast on commit.
create materialized view log on product_price with sequence, primary key, rowid ( price  ) including new values;

-- Create the MV
create materialized view product_price_latest refresh fast on commit enable query rewrite as
SELECT product_id, max( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0')) sortable_row
FROM   product_price
GROUP BY product_id;

-- Create a primary key on the MV, so we can do join elimination
alter table product_price_latest add constraint ppl_pk primary key ( product_id );

-- Insert the OP's test data
insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);

insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');

insert into product_price ( product_id, from_date, price ) values(1, date '2016-01-01', 10 );
insert into product_price ( product_id, from_date, price) values(1, date '2016-02-01', 8);
insert into product_price ( product_id, from_date, price) values(1, date '2016-05-01', 5);

insert into product_price ( product_id, from_date, price) values(2, date '2016-02-01', 5);

insert into product_price ( product_id, from_date, price) values(4, date '2016-01-01', 10);

commit;

-- Create the 5NF view using the materialized view
create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,to_date(substr(ppl.sortable_row,11,14),'YYYYMMDDHH24MISS') from_date
         ,to_number(substr(ppl.sortable_row,25)) price 
     from product p
     left join product_color pc on pc.product_id = p.product_id
     left join product_price_latest ppl on ppl.product_id = p.product_id 
;

-- The plan for this should not include any of the unnecessary tables.
select product_id from product_5nf;

-- Check the plan
SELECT *
FROM   TABLE (DBMS_XPLAN.display_cursor (null, null,
                                         'ALLSTATS LAST'));

------------------------------------------------
| Id  | Operation        | Name       | E-Rows |
------------------------------------------------
|   0 | SELECT STATEMENT |            |        |
|   1 |  INDEX FULL SCAN | PRODUCT_PK |      1 |
------------------------------------------------
like image 68
Matthew McPeak Avatar answered Oct 13 '22 09:10

Matthew McPeak


I can't make the price join be eliminated, but if you do the following, it can at least reduce the access to a single index for the price check:

CREATE OR REPLACE view product_5nf as
select p.product_id
      ,pc.color
      ,pp.price 
 from product p
 left join product_color pc ON p.product_id = pc.product_id
 left join (select pp1.product_id, pp1.price 
              from (SELECT product_id,
                           price,
                           from_date,
                           max(from_date) OVER (PARTITION BY product_id) max_from_date
                    FROM   product_price) pp1
             where pp1.from_date = max_from_date) pp ON p.product_id = pp.product_id;
like image 24
Boneist Avatar answered Oct 13 '22 07:10

Boneist