I'm using Oracle SQL and i have a basic question regarding join
command.
I have 5 tables. Each one of them has the same column as a primary key: ID (int)
. Lets look at the following queries:
select count(*) from table_a - 100 records
select count(*) from table_c - 200 records
select count(*) from table_c - 150 records
select count(*) from table_d - 100 records
select count(*) from table_e - 120 records
select * -- 88 records
from table_a a
inner join table b
on a.id = b.id
inner join table c
on a.id = c.id
inner join table d
on a.id = d.id
inner join table e
on a.id = e.id
In this case, many records are failing to be included in the output if one of the tables is not including the certain ID (even if the rest of them are included). How can i know what are these "bad" records? It's actually the complement of the intersection i think.
I want to know what are the problematic records and tables of each case. For example: ID 123 is a "bad" records since it's not included in table_c, but included in the rest of the tables. ID 321 is a problematic records since it's included in all of the tables except table_d.
You are probably looking for the symmetric difference between all your tables.
To solve that kind of problem without being too clever, you need a FULL OUTER JOIN ... USING
:
SELECT id
FROM table_a
FULL OUTER JOIN table_b USING(id)
FULL OUTER JOIN table_c USING(id)
FULL OUTER JOIN table_d USING(id)
FULL OUTER JOIN table_e USING(id)
WHERE table_a.ROWID IS NULL
OR table_b.ROWID IS NULL
OR table_c.ROWID IS NULL
OR table_d.ROWID IS NULL
OR table_e.ROWID IS NULL;
The FULL OUTER JOIN
will return all rows that satisfy the join condition (like an ordinary JOIN
) as well as all rows without corresponding rows. The USING
clause embed an implicit COALESCE
on the equijoin column.
An other option would be to use an anti-join:
SELECT id
FROM table_a
FULL OUTER JOIN table_b USING(id)
FULL OUTER JOIN table_c USING(id)
FULL OUTER JOIN table_d USING(id)
FULL OUTER JOIN table_e USING(id)
WHERE id NOT IN (
SELECT id
FROM table_a
INNER JOIN table_b USING(id)
INNER JOIN table_c USING(id)
INNER JOIN table_d USING(id)
INNER JOIN table_e USING(id)
)
Basically, this will build the union all sets minus the intersection of all sets.
Graphically, you can compare the INNER JOIN
and the OUTER JOIN
(on 3 tables only for ease of representation):
Given that test case:
ID TABLE_A TABLE_B TABLE_C TABLE_D TABLE_E 1 * - - - - 2 - * * * * 3 * - - * - 4 * * * * *
*
value in the table-
missing entry
Both queries will produce:
ID
1
3
2
If you want tabular result, you might adapt one of these query by adding a bunch of CASE
expressions. Something like that:
SELECT ID,
CASE when table_a.rowid is not null then 1 else 0 END table_a,
CASE when table_b.rowid is not null then 1 else 0 END table_b,
CASE when table_c.rowid is not null then 1 else 0 END table_c,
CASE when table_d.rowid is not null then 1 else 0 END table_d,
CASE when table_e.rowid is not null then 1 else 0 END table_e
FROM table_a
FULL OUTER JOIN table_b USING(id)
FULL OUTER JOIN table_c USING(id)
FULL OUTER JOIN table_d USING(id)
FULL OUTER JOIN table_e USING(id)
WHERE table_a.ROWID IS NULL
OR table_b.ROWID IS NULL
OR table_c.ROWID IS NULL
OR table_d.ROWID IS NULL
OR table_e.ROWID IS NULL;
Producing:
ID TABLE_A TABLE_B TABLE_C TABLE_D TABLE_E 1 1 0 0 0 0 3 1 0 0 1 0 2 0 1 1 1 1
1
value in the table0
missing entry
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With