Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sql query optimisation

Please compare the following:

INNER JOIN table1 t1 ON t1.someID LIKE 'search.%' AND 
                        t1.someID = ( 'search.' || t0.ID )

vs.

INNER JOIN table1 t1 ON t1.someID = ( 'search.' || t0.ID )

I've been told, that the first case is optimized. But you know, I can not understand why it is. As far as I understand the 2nd example should run faster.

We use Oracle, but I suppose it does not matter at the moment.

Please explain if I'm wrong.

Thank you

like image 796
Alexandr Avatar asked May 05 '11 05:05

Alexandr


2 Answers

So, here is the explain plan for a query which joins on just the concatenated string:

SQL> explain plan for
  2     select e.* from emp e
  3         join big_table bt on bt.col2 = 'search'||trim(to_char(e.empno))
  4  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 179424166

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |  1052 | 65224 |    43   (0)| 00:00:01 |
|   1 |  NESTED LOOPS      |          |  1052 | 65224 |    43   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMP      |    20 |   780 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | BIG_VC_I |    53 |  1219 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("BT"."COL2"='search'||TRIM(TO_CHAR("E"."EMPNO")))

15 rows selected.

SQL>

Compare and contrast with the plan for a query which includes the LIKE clause in its join:

SQL> explain plan for
  2     select e.* from emp e
  3           join big_table bt on (bt.col2 like 'search%'
  4               and bt.col2 = 'search'||trim(to_char(e.empno)))
  5  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 179424166

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |    62 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS      |          |     1 |    62 |     5   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| EMP      |     1 |    39 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | BIG_VC_I |     1 |    23 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter('search'||TRIM(TO_CHAR("E"."EMPNO")) LIKE 'search%')
   3 - access("BT"."COL2"='search'||TRIM(TO_CHAR("E"."EMPNO")))
       filter("BT"."COL2" LIKE 'search%')

17 rows selected.

SQL>

The cost of the second query is much lower than the first. But this is because the optimizer is estimating that the second query will return far fewer rows than the first query. More information allows the database to make a more accurate prediction. (In fact the query will return no rows).

Of course this does presume the joined column is indexed, otherwise it won't make any difference.

The other thing to bear in mind is that the columns which are queried can affect the plan. This version selects from BIG_TABLE rather than EMP.

SQL> explain plan for
  2     select bt.* from emp e
  3           join big_table bt on (bt.col2 like 'search%'
  4                        and bt.col2 = 'search'||trim(to_char(e.empno)))
  5  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------

Plan hash value: 4042413806

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |     1 |    46 |     4   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |           |       |       |            |          |
|   2 |   NESTED LOOPS               |           |     1 |    46 |     4   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN           | PK_EMP    |     1 |     4 |     1   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN          | BIG_VC_I  |     1 |       |     2   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| BIG_TABLE |     1 |    42 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter('search'||TRIM(TO_CHAR("E"."EMPNO")) LIKE 'search%')
   4 - access("BT"."COL2"='search'||TRIM(TO_CHAR("E"."EMPNO")))
       filter("BT"."COL2" LIKE 'search%')

19 rows selected.

SQL>
like image 194
APC Avatar answered Sep 25 '22 06:09

APC


The query analysis of the various database engines would really tell the story but my first instinct would be that the first form is in fact optimized. The reason is that the compiler cannot guess as the to results of the concatenation. It must do more work to determine the value against which to do the match and would likely result in a table scan. The first still must do that, however, it is able to narrow the resultset using the LIKE operator (presuming an index exists on the someID column) first and thus has to do fewer concatenations.

like image 26
Thomas Avatar answered Sep 23 '22 06:09

Thomas