Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle SQL execution plan changes due to SYS_OP_C2C internal conversion

I'm wondering why cost of this query

select * from address a
left join name n on n.adress_id=a.id
where a.street='01';

is higher than

select * from address a
left join name n on n.adress_id=a.id
where a.street=N'01';

where address table looks like this

ID              NUMBER
STREET          VARCHAR2(255 CHAR)
POSTAL_CODE     VARCHAR2(255 CHAR)

and name table looks like this

ID              NUMBER
ADDRESS_ID      NUMBER
NAME            VARCHAR2(255 CHAR)
SURNAME         VARCHAR2(255 CHAR)

These are costs returned by explain plan

Explain plan for '01'

-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |  3591 |  1595K|    87   (0)| 00:00:02 |
|   1 |  NESTED LOOPS OUTER          |                      |  3591 |  1595K|    87   (0)| 00:00:02 |
|*  2 |   TABLE ACCESS FULL          | ADDRESS              |     3 |   207 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS BY INDEX ROWID| NAME                 |  1157 |   436K|    47   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN          | NAME_HSI             |  1157 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("A"."STREET"='01')
   4 - access("N"."ADDRESS_ID"(+)="A"."ID")

Explain plan for N'01'

-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |   347 |   154K|    50   (0)| 00:00:01 |
|   1 |  NESTED LOOPS OUTER          |                      |   347 |   154K|    50   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL          | ADDRESS              |     1 |    69 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS BY INDEX ROWID| NAME                 |  1157 |   436K|    47   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN          | NAME_HSI             |  1157 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SYS_OP_C2C("A"."STREET")=U'01')
   4 - access("N"."ADDRESS_ID"(+)="A"."ID")

As you can see cost for N'01' query is lower than cost for '01'. Any idea why? N'01' needs additionally convert varchar to nvarchar so cost should be higher (SYS_OP_C2C()). The other question is why rows processed by N'01' query is lower than '01'?

[EDIT]

  • Table address has 30 rows.
  • Table name has 19669 rows.
like image 254
pepuch Avatar asked Oct 10 '14 09:10

pepuch


People also ask

What are the reasons for SQL plan change in Oracle?

A plan change can occur due for a variety of reasons including but not limited to the following types of changes occurring in the system: optimizer version, optimizer statistics, optimizer parameters, schema/metadata definitions, system settings, as well as SQL profile creation.

How do I know if execution plan has changed?

You can check if the SQL execution plan has changed by using the Active Workload Repository (AWR). First, you need to find the SQL_ID for the relevant query. The view GV$SQL contains the most recent SQL. If you can't find the query in this view, try DBA_HIST_SQLTEXT instead.

What is SYS_OP_C2C?

SYS_OP_C2C is an internal function which does an implicit conversion of varchar2 to national character set using TO_NCHAR function. Thus, the filter completely changes as compared to the filter using normal comparison.


2 Answers

SYS_OP_C2C is an internal function which does an implicit conversion of varchar2 to national character set using TO_NCHAR function. Thus, the filter completely changes as compared to the filter using normal comparison.

I am not sure about the reason why the number of rows are less, but I can guarantee it could be more too. Cost estimation won't be affected.

Let's try to see step-by-step in a test case.

SQL> CREATE TABLE t AS SELECT 'a'||LEVEL col FROM dual CONNECT BY LEVEL < 1000;

Table created.

SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = 'a10';

Explained.

SQL> SELECT * FROM TABLE(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     5 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |     5 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

   1 - filter("COL"='a10')

13 rows selected.

SQL>

So far so good. Since there is only one row with value as 'a10', optimizer estimated one row.

Let's see with the national characterset conversion.

SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = N'a10';

Explained.

SQL> SELECT * FROM TABLE(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    10 |    50 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |    10 |    50 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

   1 - filter(SYS_OP_C2C("COL")=U'a10')

13 rows selected.

SQL>

What happened here? We can see filter(SYS_OP_C2C("COL")=U'a10'), which means an internal function is applied and it converts the varchar2 value to nvarchar2. The filter now found 10 rows.

This will also suppress any index usage, since now a function is applied on the column. We can tune it by creating a function-based index to avoid full table scan.

SQL> create index nchar_indx on t(to_nchar(col));

Index created.

SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE to_nchar(col) = N'a10';

Explained.

SQL> SELECT * FROM TABLE(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1400144832

--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |    10 |    50 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T          |    10 |    50 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | NCHAR_INDX |     4 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
---------------------------------------------------

   2 - access(SYS_OP_C2C("COL")=U'a10')

14 rows selected.

SQL>

However, will this make the execution plans similar? No. i think with two different charactersets , the filter will not be applied alike. Thus, the difference lies.

My research says,

Usually, such scenarios occur when the data coming via an application is nvarchar2 type, but the table column is varchar2. Thus, Oracle applies an internal function in the filter operation. My suggestion is, to know your data well, so that you use similar data types during design phase.

like image 74
Lalit Kumar B Avatar answered Sep 16 '22 13:09

Lalit Kumar B


When worrying about explain plans, it matters whether there are current statistics on the tables. If the statistics do not represent the actual data reasonably well, then the optimizer will make mistakes and estimate cardinalities incorrectly.

You can check how long ago statistics were gathered by querying the data dictionary:

select table_name, last_analyzed
  from user_tables
 where table_name in ('ADDRESS','NAME');

You can gather statistics for the optimizer to use by calling DBMS_STATS:

begin
   dbms_stats.gather_table_stats(user, 'ADDRESS');
   dbms_stats.gather_table_stats(user, 'NAME');
end;

So perhaps after gathering statistics you will get different explain plans. Perhaps not.

The difference in your explain plans is primarily because the optimizer estimates how many rows it will find in address table differently in the two cases.

In the first case you have an equality predicate with same datatype - this is good and the optimizer can often estimate cardinality (row count) reasonably well for cases like this.

In the second case a function is applied to the column - this is often bad (unless you have function based indexes) and will force the optimizer to take a wild guess. That wild quess will be different in different versions of Oracle as the developers of the optimizer tries to improve upon it. Some versions the wild guess will simply be something like "I guess 5% of the number of rows in the table."

When comparing different datatypes, it is best to avoid implicit conversions, particularly when like this case the implicit conversion makes a function on the column rather than the literal. If you have cases where you get a value as datatype NVARCHAR2 and need to use it in a predicate like above, it can be a good idea to explicitly convert the value to the datatype of the column.

select * from address a
left join name n on n.adress_id=a.id
where a.street = CAST( N'01' AS VARCHAR2(255));

In this case with a literal it does not make sense, of course. Here you would just use your first query. But if it was a variable or function parameter, maybe you could have use cases for doing something like this.

like image 43
Kim Berg Hansen Avatar answered Sep 17 '22 13:09

Kim Berg Hansen