Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle - Table alias and NULL evaluation in join

Tags:

sql

null

oracle

I was just trying to make an example to explain how NULL in Oracle can lead to 'unexpected' behaviours, but I've found something I did not expect...

setup:

create table tabNull (val varchar2(10), descr varchar2(100));
insert into tabNull values (null, 'NULL VALUE');
insert into tabNull values ('A', 'ONE CHAR');

This gives what I expected:

SQL> select * from tabNull T1 inner join tabNull T2 using(val);

VAL        DESCR                DESCR
---------- -------------------- --------------------
A          ONE CHAR             ONE CHAR

If I remove table aliases, I get:

SQL> select * from tabNull inner join tabNull using(val);

VAL        DESCR                DESCR
---------- -------------------- --------------------
A          ONE CHAR             ONE CHAR
A          ONE CHAR             ONE CHAR

and this is quite surprising to me.

A reason can be found in the execution plans for the two queries; with table aliases, Oracle makes an HASH JOIN and then checks for T1.val = T2.val:

------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         |     1 |   118 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN         |         |     1 |   118 |     7  (15)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| TABNULL |     2 |   118 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| TABNULL |     2 |   118 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."VAL"="T2"."VAL")

Without aliases, it first filters one occurrence of the table for not null values, thus picking only one row, and then it makes a CARTESIAN with the second occurrence, thus giving two rows; even if it's correct, I would expect the result of a cartesian, but I don't have any row with DESCR = 'NULL VALUE'.

--------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         |     2 |   118 |     6   (0)| 00:00:01 |
|   1 |  MERGE JOIN CARTESIAN|         |     2 |   118 |     6   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | TABNULL |     1 |    59 |     3   (0)| 00:00:01 |
|   3 |   BUFFER SORT        |         |     2 |       |     3   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL | TABNULL |     2 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TABNULL"."VAL" IS NOT NULL)

Is this somehow correct / expected? Isn't the result value of the cartesian even stranger than the number of returned rows? Am I misunderstanding the plans, or missing something so big that I can't see?

like image 786
Aleksej Avatar asked Aug 12 '16 12:08

Aleksej


People also ask

What is (+) in Oracle join?

The plus sign is Oracle syntax for an outer join. There isn't a minus operator for joins. An outer join means return all rows from one table. Also return the rows from the outer joined where there's a match on the join key. If there's no matching row, return null.

Can we use alias name in having clause in Oracle?

column_alias can be used in an ORDER BY clause, but it cannot be used in a WHERE, GROUP BY, or HAVING clause. Standard SQL disallows references to column aliases in a WHERE clause. This restriction is imposed because when the WHERE clause is evaluated, the column value may not yet have been determined.

Can we compare null values in Oracle?

Because null represents a lack of data, a null cannot be equal or unequal to any value or to another null. However, Oracle considers two nulls to be equal when evaluating a DECODE function.

Does Count return null?

COUNT never returns null. The following example calculates, for each employee in the employees table, the moving count of employees earning salaries in the range 50 less than through 150 greater than the employee's salary.


2 Answers

According to http://docs.oracle.com/javadb/10.10.1.2/ref/rrefsqljusing.html using(val) translates here as ON tabnull.val=tabnull.val So

select tabNull.*, tabNull.descr from tabNull inner join tabNull 
on tabNull.val = tabNull.val;

Next to build a plan Oracle must [virtually] assign different aliases for every JOIN member but sees no reason to use second alias at any place in SELECT and ON. So

select t1.*, t1.descr from tabNull t1 inner join tabNull t2 
on t1.val = t1.val;

Plan

--------------------------------------------------------------------------------                        
| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)| Time     |                        
--------------------------------------------------------------------------------                        
|   0 | SELECT STATEMENT     |         |     2 |    28 |     4   (0)| 00:00:01 |                        
|   1 |  MERGE JOIN CARTESIAN|         |     2 |    28 |     4   (0)| 00:00:01 |                        
|*  2 |   TABLE ACCESS FULL  | TABNULL |     1 |    14 |     2   (0)| 00:00:01 |                        
|   3 |   BUFFER SORT        |         |     2 |       |     2   (0)| 00:00:01 |                        
|   4 |    TABLE ACCESS FULL | TABNULL |     2 |       |     2   (0)| 00:00:01 |                        
--------------------------------------------------------------------------------                        


Predicate Information (identified by operation id):                                                     
---------------------------------------------------                                                     

   2 - filter("T1"."VAL" IS NOT NULL)    
like image 171
Serg Avatar answered Oct 03 '22 23:10

Serg


EDIT: I say below that the syntax is illegal; on further thought, that's BS on my part, I don't know that for a fact (I can't point to where in the language definition aliases are required for a self-join). I still believe the explanation below is probably correct, whether it is for the "bug" or for the "undefined behavior" I mention below.

*

The syntax is illegal (you knew that - you were just curious to see what would happen, and if you can understand the output). I agree with jarlh that you should have received an error message. Clearly Oracle didn't code it that way.

Since this is not valid syntax, what you are seeing can't be called a bug (so I disagree with Nick's comment). The behavior is "undefined" - when you use syntax that is not supported by the Oracle language definition, you may get any kind of crazy results, for which Oracle is not taking any responsibility.

OK, with that out of the way, is there any explanation for what you are seeing? I believe it is indeed a Cartesian join, and not a union as Nick suggested.

Let's put ourselves in the optimizer's shoes. It sees the first table in the FROM list, it scans it, so far so good.

Then it reads the second table, and it has a list of columns like this:

tabNULL.val, tabNULL.descr, tabNULL.val, tabNULL.descr

The join condition is tabNULL.val = tabNULL.val

The optimizer is dumb, it is not smart. It, unlike you, doesn't realize at this point that tabNULL is meant to stand for two different incarnations of the table. It thinks tabNULL.val on both sides of the equation are THE SAME value and they both refer to the first "incarnation" of the table. The only case when that fails is if tabNULL.val is NULL, so it REWRITES the query with the clause becoming tabNULL.val IS NOT NULL.

Only the FIRST table is checked for tabNULL.val IS NOT NULL; the optimizer doesn't "know" tabNULL.val appears again in the list and it may have a DIFFERENT meaning! Then the join happens; at this point there are no other conditions left, so BOTH rows in the second incarnation of the table will produce rows in the join, for A, ONE CHAR from the first table.

Then, in the projection, again only the FIRST tabNULL.val will be read and will populate BOTH columns in the output. You ask the query engine to return the value tabNULL.val twice, and in your mind it's from different places, but there is only one memory location labeled tabNULL.val, and it stores what came from the first table.

Of course, very few know with any certainty what the optimizer and the query engine do, but in this case I think this is a pretty safe guess.

like image 30
mathguy Avatar answered Oct 03 '22 21:10

mathguy