Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using the same table alias twice in a query

Tags:

sql

oracle

ansi

My coworker, who is new to ANSI join syntax, recently wrote a query like this:

SELECT count(*)
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b)
       JOIN table3 t3 ON
             (t3.col_c = t1.col_c);

Note that table3 is joined to both table1 and table2 on different columns, but the two JOIN clauses use the same table alias for table3.

The query runs, but I'm unsure of it's validity. Is this a valid way of writing this query?

I thought the join should be like this:

SELECT count(*)
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b AND
             t3.col_c = t1.col_c);

Are the two versions functionally identical? I don't really have enough data in our database yet to be sure.

Thanks.

like image 715
AndyDan Avatar asked Jan 03 '18 21:01

AndyDan


Video Answer


2 Answers

The first query is a join of 4 tables, the second one is a join of 3 tables. So I don't expect that both queries return the same numbers of rows.

SELECT *
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b)
       JOIN table3 t3 ON
             (t3.col_c = t1.col_c);
             

The alias t3 is only used in the ON clause. The alias t3 refers to the table before the ON keyword. I found this out by experimenting. So the pervious query is equvivalent to

SELECT *
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b)
       JOIN table3 t4 ON
             (t4.col_c = t1.col_c);

and this can be transfotmed in a traditional join

SELECT *
  FROM table1 t1,
       table2 t2,
       table3 t3,
       table3 t4
where (t1.col_a = t2.col_a)
    and  (t2.col_b = t3.col_b)
    and (t4.col_c = t1.col_c);
    

The second query is

SELECT *
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b AND
             t3.col_c = t1.col_c);
             

This can also transformed in a traditional join

SELECT *
  FROM table1 t1,
    table2 t2,
    table3 t3
where (t1.col_a = t2.col_a)
    and (t2.col_b = t3.col_b)
    AND (t3.col_c = t1.col_c);
    

These queries seem to be different. To proof their difference we use the following example:

create table table1(
    col_a number,
    col_c number
);

create table table2(
    col_a number,
    col_b number
);

create table table3(
    col_b number,
    col_c number
);

insert into table1(col_a, col_c) values(1,3);
insert into table1(col_a, col_c) values(4,3);
insert into table2(col_a, col_b) values(1,2);
insert into table2(col_a, col_b) values(4,2);
insert into table3(col_b, col_c) values(2,3);
insert into table3(col_b, col_c) values(2,5);
insert into table3(col_b, col_c) values(7,9);

commit;

We get the following output

SELECT *
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b)
       JOIN table3 t3 ON
             (t3.col_c = t1.col_c)


| COL_A | COL_C | COL_A | COL_B | COL_B | COL_C | COL_B | COL_C |
|-------|-------|-------|-------|-------|-------|-------|-------|
|     1 |     3 |     1 |     2 |     2 |     3 |     2 |     3 |
|     4 |     3 |     4 |     2 |     2 |     3 |     2 |     3 |
|     1 |     3 |     1 |     2 |     2 |     5 |     2 |     3 |
|     4 |     3 |     4 |     2 |     2 |     5 |     2 |     3 |



             
SELECT *
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b AND
             t3.col_c = t1.col_c)

| COL_A | COL_C | COL_A | COL_B | COL_B | COL_C |
|-------|-------|-------|-------|-------|-------|
|     4 |     3 |     4 |     2 |     2 |     3 |
|     1 |     3 |     1 |     2 |     2 |     3 |

The number of rows retrieved is different and so count(*) is different.

The usage of the aliases was surprising. at least for me.

The following query works because t1 in the where_clause references table2.

select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_b<0;

The following query works because t1 in the where_clause references table1.

select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_c<0;

The following query raises an error because both table1 and table2 contain a column col_a.

select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_a<0;

The error thrown is

ORA-00918: column ambiguously defined

The following query works, the alias t1 refers to two different tables in the same where_clause.

select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_b<0 and t1.col_c<0;

These and more examples can be found here: http://sqlfiddle.com/#!4/84feb/12

The smallest counter example

The smallest counter example is

table1
col_a  col_c
    1      2

table2
col_a  col_b
    1      3

table3
col_b  col_c
    3      5
    6      2    

Here the second query has an empty result set and the first query returns one row. It can be shown that the count(*) of the second query never exeeds the count(*)of the first query.

A more detailed explanation

This behaviour will became more clear if we analyze the following statement in detail.

SELECT t.col_b, t.col_c
  FROM table1 t
       JOIN table2 t ON
            (t.col_b = t.col_c) ;
        

Here is the reduced syntax for this query in Backus–Naur form derived from the syntax descriptions in the SQL Language Reference of Oracle 12.2. Note that under each syntax diagram there is a link to the Backus–Naur form of this diagram, e.g Description of the illustration select.eps. "reduced" means that I left out all the possibilities that where not used, e,g. the select is defined as

select::=subquery [ for_update_clause ] ;

Our query does not use the optional for_update_clause, so I reduced the rule to

select::=subquery

The only exemption is the optional where-clause. I didn't remove it so that this reduced rules can be used to analyze the above query even if we add a where_clause.

These reduced rule will define only a subset of all possible select statements.

select::=subquery 
subquery::=query_block
query_block::=SELECT select_list FROM join_clause [ where_clause ]
join_clause::=table_reference  inner_cross_join_clause ...  
table_reference::=query_table_expression  t_alias query_table_expression::=table
inner_cross_join_clause::=JOIN table_reference ON condition

So our select statement is a query_block and the join_clause is of type

table_reference inner_cross_join_clause

where table_reference is table1 t and inner_cross_join_clause is JOIN table2 t ON (t.col_b = t.col_c). The ellipsis ... means that there could be additional inner_cross_join_clauses, but we do not need this here.

in the inner_cross_join_clause the alias t refers to table2. Only if these references cannot be satisfied the aliasmust be searched in an outer scope. So all the following expressions in the ONcondition are valid:

t.col_b = t.col_c

Here t.col_b is table2.col_b because t refers to the alias of its inner_cross_join_clause, t.col_c is table1.col_c. t of the inner_cross_join_clause (refering to table2) has no column col_c so the outer scope will be searched and an appropriate alias will be found.

If we have the clause

t.col_a = t.col_a

the alias can be found as alias defined in the inner_cross_join_clause to which this ON-condition belongs so t will be resolved to table2.

if the select list consists of

t.col_c, t.col_b, t.col_a

instead of * then the join_clause will be searched for an alias and t.col_c will be resolved to table1.col_c (table2 does not contain a column col_c), t.col_b will be resolved to table2.col_b (table1 does not contain a col_b) but t.col_a will raise the error

ORA-00918: column ambiguously defined

because for the select_list none of the aias definition has a precedenve over the other. If our query also has a where_clause then the aliases are resolved in the same way as if they are used in the select_list.

like image 67
miracle173 Avatar answered Sep 18 '22 00:09

miracle173


With more data, it will produce different results. Your colleagues query is same as this.

select * from table3 where t3.col_b = 'XX'
union
select * from table3 where t3.col_c = 'YY'

or

select * from table3 where t3.col_b = 'XX' or t3.col_c = 'YY'

while your query is like this.

select * from table3 where t3.col_b ='XX' and t3.col_c='YY'

First one is like data where (xx or yy) while second one is data where ( xx and yy)

like image 36
taewoo Avatar answered Sep 19 '22 00:09

taewoo