Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange behaviour of full outer join in Oracle - how it could be explained?

I have noticed a strange behaviour of FULL OUTER JOIN in Oracle 11. I was joining tables from HR schema, particularily EMPLOYEES and DEPARTMENTS.

For example, the following query returns 123 rows:

    SELECT * FROM employees e
    FULL JOIN departments d ON e.department_id = d.department_id

However, what is tricky to understand - when I put a set of particular columns in the select clause, the query will return 122 rows (a missing row is for an employee which has no department assigned - the one which is additionally returned with left join in comparison to inner join):

    SELECT first_name, last_name, department_name FROM employees e
    FULL JOIN departments d on e.department_id = d.department_id

Even when I count the rows it returns 122 (COUNT(*))!!! WHAT IS GOING ON? What is the difference between SELECT * and SELECT COUNT(*)?

The explain plan for SELECT * ... :

SELECT STATEMENT                                      122
  VIEW                 VW_FOJ_0                       122
    HASH JOIN                          FULL OUTER     122
      Access Predicates
        E.DEPARTMENT_ID = D.DEPARTMENT_ID
      TABLE ACCESS     DEPARTMENTS     FULL            27
      TABLE ACCESS     EMPLOYEES       FULL           107

and for SELECT COUNT(*) ... :

SELECT STATEMENT                                             1
  SORT                                     AGGREGATE         1
    VIEW               VW_FOJ_0                            122
      HASH JOIN                            FULL OUTER      122
        Access Predicates
          E.DEPARTMENT_ID = D.DEPARTMENT_ID
        INDEX          DEPT_ID_PK          FAST FULL SCAN   27
        INDEX          EMP_DEPARTMENT_IX   FAST FULL SCAN  107
like image 415
Jacek L. Avatar asked Mar 15 '12 10:03

Jacek L.


2 Answers

The optimizer should not be choosing to use the index on EMP.DEPT_ID in the second query, since it can have NULL values. This is what's causing it to exclude one row from the results.

The only non-bug explanation I can think of at the moment is that you have somehow created constraints in DISABLE RELY mode so that the optimizer thinks that the field cannot contain NULLs. In this case it would be correct to use the index given the incorrect information in the constraints. However, it seems the RELY option is not available for NOT NULL constraints, so I don't see how this could be the problem. Nonetheless, take a careful look at all constraints on the tables.

That aside, there are a surprising number of bugs on Oracle's site regarding wrong results from full outer joins. You might be hitting one of them. In quite a few of these cases, the workaround is to disable "native" full outer joins, which you can do for your current session with this statement:

alter session set "_optimizer_native_full_outer_join"=off; 
like image 185
Dave Costa Avatar answered Nov 16 '22 00:11

Dave Costa


(Can't write this in a comment)

The results are conformed to execution plans.

The count(*) execution plan uses the index EMP_DEPARTMENT_IX which contains all dept_ids from employess table. But indexes does not contain nulls. So, this execution plan will "lose" the emps with null department_id.

However, should be explained why Oracle choose this execution plan in case of

select first_name, last_name, department_name

and

select count(*)

in opposition to

select *
like image 29
Florin stands with Ukraine Avatar answered Nov 16 '22 00:11

Florin stands with Ukraine