Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

complement of the intersection in SQL

I'm using Oracle SQL and i have a basic question regarding join command.

I have 5 tables. Each one of them has the same column as a primary key: ID (int). Lets look at the following queries:

select count(*) from table_a - 100 records
select count(*) from table_c - 200 records
select count(*) from table_c - 150 records
select count(*) from table_d - 100 records
select count(*) from table_e - 120 records

select * -- 88 records
 from table_a a
  inner join table b
    on a.id = b.id
  inner join table c
    on a.id = c.id
  inner join table d
    on a.id = d.id
  inner join table e
    on a.id = e.id

In this case, many records are failing to be included in the output if one of the tables is not including the certain ID (even if the rest of them are included). How can i know what are these "bad" records? It's actually the complement of the intersection i think.

I want to know what are the problematic records and tables of each case. For example: ID 123 is a "bad" records since it's not included in table_c, but included in the rest of the tables. ID 321 is a problematic records since it's included in all of the tables except table_d.

like image 570
Omri Avatar asked Dec 14 '22 19:12

Omri


1 Answers

You are probably looking for the symmetric difference between all your tables.

To solve that kind of problem without being too clever, you need a FULL OUTER JOIN ... USING:

SELECT id
    FROM table_a
    FULL OUTER JOIN table_b USING(id) 
    FULL OUTER JOIN table_c USING(id) 
    FULL OUTER JOIN table_d USING(id) 
    FULL OUTER JOIN table_e USING(id) 
WHERE table_a.ROWID IS NULL
   OR table_b.ROWID IS NULL
   OR table_c.ROWID IS NULL
   OR table_d.ROWID IS NULL
   OR table_e.ROWID IS NULL;

The FULL OUTER JOIN will return all rows that satisfy the join condition (like an ordinary JOIN) as well as all rows without corresponding rows. The USING clause embed an implicit COALESCE on the equijoin column.


An other option would be to use an anti-join:

SELECT id
    FROM table_a
    FULL OUTER JOIN table_b USING(id) 
    FULL OUTER JOIN table_c USING(id) 
    FULL OUTER JOIN table_d USING(id) 
    FULL OUTER JOIN table_e USING(id) 
WHERE id NOT IN (
    SELECT id
        FROM table_a
        INNER JOIN table_b USING(id) 
        INNER JOIN table_c USING(id) 
        INNER JOIN table_d USING(id) 
        INNER JOIN table_e USING(id) 
)

Basically, this will build the union all sets minus the intersection of all sets.

Graphically, you can compare the INNER JOIN and the OUTER JOIN (on 3 tables only for ease of representation):

INNER JOINFULL OUTER JOIN


Given that test case:

ID    TABLE_A TABLE_B TABLE_C TABLE_D TABLE_E
1     *       -       -       -       -
2     -       *       *       *       *
3     *       -       -       *       -
4     *       *       *       *       *

* value in the table - missing entry

Both queries will produce:

ID
1
3
2

If you want tabular result, you might adapt one of these query by adding a bunch of CASE expressions. Something like that:

SELECT ID,
    CASE when table_a.rowid is not null then 1 else 0 END table_a,
    CASE when table_b.rowid is not null then 1 else 0 END table_b,
    CASE when table_c.rowid is not null then 1 else 0 END table_c,
    CASE when table_d.rowid is not null then 1 else 0 END table_d,
    CASE when table_e.rowid is not null then 1 else 0 END table_e
FROM table_a
    FULL OUTER JOIN table_b USING(id) 
    FULL OUTER JOIN table_c USING(id) 
    FULL OUTER JOIN table_d USING(id) 
    FULL OUTER JOIN table_e USING(id) 
WHERE table_a.ROWID IS NULL
   OR table_b.ROWID IS NULL
   OR table_c.ROWID IS NULL
   OR table_d.ROWID IS NULL
   OR table_e.ROWID IS NULL;

Producing:

ID    TABLE_A TABLE_B TABLE_C TABLE_D TABLE_E
1     1       0       0       0       0
3     1       0       0       1       0
2     0       1       1       1       1

1 value in the table 0 missing entry

like image 117
Sylvain Leroux Avatar answered Dec 28 '22 07:12

Sylvain Leroux