I was just trying to make an example to explain how NULL
in Oracle can lead to 'unexpected' behaviours, but I've found something I did not expect...
setup:
create table tabNull (val varchar2(10), descr varchar2(100));
insert into tabNull values (null, 'NULL VALUE');
insert into tabNull values ('A', 'ONE CHAR');
This gives what I expected:
SQL> select * from tabNull T1 inner join tabNull T2 using(val);
VAL DESCR DESCR
---------- -------------------- --------------------
A ONE CHAR ONE CHAR
If I remove table aliases, I get:
SQL> select * from tabNull inner join tabNull using(val);
VAL DESCR DESCR
---------- -------------------- --------------------
A ONE CHAR ONE CHAR
A ONE CHAR ONE CHAR
and this is quite surprising to me.
A reason can be found in the execution plans for the two queries; with table aliases, Oracle makes an HASH JOIN and then checks for T1.val = T2.val
:
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 118 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 118 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL| TABNULL | 2 | 118 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TABNULL | 2 | 118 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."VAL"="T2"."VAL")
Without aliases, it first filters one occurrence of the table for not null values, thus picking only one row, and then it makes a CARTESIAN with the second occurrence, thus giving two rows; even if it's correct, I would expect the result of a cartesian, but I don't have any row with DESCR = 'NULL VALUE'.
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 118 | 6 (0)| 00:00:01 |
| 1 | MERGE JOIN CARTESIAN| | 2 | 118 | 6 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | TABNULL | 1 | 59 | 3 (0)| 00:00:01 |
| 3 | BUFFER SORT | | 2 | | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | TABNULL | 2 | | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("TABNULL"."VAL" IS NOT NULL)
Is this somehow correct / expected? Isn't the result value of the cartesian even stranger than the number of returned rows? Am I misunderstanding the plans, or missing something so big that I can't see?
The plus sign is Oracle syntax for an outer join. There isn't a minus operator for joins. An outer join means return all rows from one table. Also return the rows from the outer joined where there's a match on the join key. If there's no matching row, return null.
column_alias can be used in an ORDER BY clause, but it cannot be used in a WHERE, GROUP BY, or HAVING clause. Standard SQL disallows references to column aliases in a WHERE clause. This restriction is imposed because when the WHERE clause is evaluated, the column value may not yet have been determined.
Because null represents a lack of data, a null cannot be equal or unequal to any value or to another null. However, Oracle considers two nulls to be equal when evaluating a DECODE function.
COUNT never returns null. The following example calculates, for each employee in the employees table, the moving count of employees earning salaries in the range 50 less than through 150 greater than the employee's salary.
According to http://docs.oracle.com/javadb/10.10.1.2/ref/rrefsqljusing.html
using(val)
translates here as ON tabnull.val=tabnull.val
So
select tabNull.*, tabNull.descr from tabNull inner join tabNull
on tabNull.val = tabNull.val;
Next to build a plan Oracle must [virtually] assign different aliases for every JOIN member but sees no reason to use second alias at any place in SELECT and ON. So
select t1.*, t1.descr from tabNull t1 inner join tabNull t2
on t1.val = t1.val;
Plan
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 28 | 4 (0)| 00:00:01 |
| 1 | MERGE JOIN CARTESIAN| | 2 | 28 | 4 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | TABNULL | 1 | 14 | 2 (0)| 00:00:01 |
| 3 | BUFFER SORT | | 2 | | 2 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | TABNULL | 2 | | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T1"."VAL" IS NOT NULL)
EDIT: I say below that the syntax is illegal; on further thought, that's BS on my part, I don't know that for a fact (I can't point to where in the language definition aliases are required for a self-join). I still believe the explanation below is probably correct, whether it is for the "bug" or for the "undefined behavior" I mention below.
*
The syntax is illegal (you knew that - you were just curious to see what would happen, and if you can understand the output). I agree with jarlh that you should have received an error message. Clearly Oracle didn't code it that way.
Since this is not valid syntax, what you are seeing can't be called a bug (so I disagree with Nick's comment). The behavior is "undefined" - when you use syntax that is not supported by the Oracle language definition, you may get any kind of crazy results, for which Oracle is not taking any responsibility.
OK, with that out of the way, is there any explanation for what you are seeing? I believe it is indeed a Cartesian join, and not a union as Nick suggested.
Let's put ourselves in the optimizer's shoes. It sees the first table in the FROM list, it scans it, so far so good.
Then it reads the second table, and it has a list of columns like this:
tabNULL.val, tabNULL.descr, tabNULL.val, tabNULL.descr
The join condition is tabNULL.val = tabNULL.val
The optimizer is dumb, it is not smart. It, unlike you, doesn't realize at this point that tabNULL
is meant to stand for two different incarnations of the table. It thinks tabNULL.val
on both sides of the equation are THE SAME value and they both refer to the first "incarnation" of the table. The only case when that fails is if tabNULL.val
is NULL, so it REWRITES the query with the clause becoming tabNULL.val IS NOT NULL
.
Only the FIRST table is checked for tabNULL.val IS NOT NULL
; the optimizer doesn't "know" tabNULL.val
appears again in the list and it may have a DIFFERENT meaning! Then the join happens; at this point there are no other conditions left, so BOTH rows in the second incarnation of the table will produce rows in the join, for A, ONE CHAR
from the first table.
Then, in the projection, again only the FIRST tabNULL.val
will be read and will populate BOTH columns in the output. You ask the query engine to return the value tabNULL.val
twice, and in your mind it's from different places, but there is only one memory location labeled tabNULL.val
, and it stores what came from the first table.
Of course, very few know with any certainty what the optimizer and the query engine do, but in this case I think this is a pretty safe guess.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With