Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle - Partially nullable composite foreign key

I have a legacy Oracle database that has a strange quirk I wanted to make sense of. It has a composite foreign key where some columns are nullable. To me this smells like a bad design by a careless developer, but I wanted to ask for opinions. Of course the original development team is long gone.

The table is much larger in terms of columns but I think I was able to distill the issue in the example below:

create table quadrant (
  region number(9) not null,
  area number(9) not null,
  caption varchar2(20),
  primary key (region, area)
);

insert into quadrant (region, area, caption) values (10, 123, 'Chicago');
insert into quadrant (region, area, caption) values (10, 125, 'Wisconsin');

create table farm (
  id number(9),
  region_id number(9) not null,
  area_id number(9),
  name varchar2(50),
  constraint fk_region_area foreign key (region_id, area_id)
    references quadrant (region, area)
);

insert into farm (id, region_id, area_id, name) values (5, 10, null, 'farm 1');
insert into farm (id, region_id, area_id, name) values (6, 11, null, 'farm 2');

select * from farm;

Result:

ID  REGION_ID  AREA_ID  NAME
--  ---------  -------  ------
5   10         <null>   farm 1  <-- Does it point to anything?
6   11         <null>   farm 2  <-- Region 11 doesn't even exist!

If parts of a foreign key are null, then what meaning does it have?

  • Is REGION_ID, AREA_ID = (10, null) pointing to anything, or it's just useless info?
  • Is REGION_ID, AREA_ID = (11, null) pointing to anything? I don't think so.

I am tempted to add a constraint to enforce ALL or NONE in terms of null values on the foreign key. Does it make sense?

But above all, what's the use case for this "feature"?

like image 820
The Impaler Avatar asked Sep 13 '18 18:09

The Impaler


1 Answers

Thanks to all answers and comments. This issue I have forced me to learn something new, and that's a good thing. @philipxy gave me the big clue. I wanted to recap what I learned since it will probably be of use to someone else, and this is a good place to record it.

This question has two sides: First, what a partially null foreign key means, and second how it's implemented.

Meaning of a Partially Null Foreign Keys

There's a lot of debate of what does this mean -- as @agiles231 points out. NULL could mean:

  • a value is unknown.
  • others say it means the value is invalid.
  • others say NULL is a bona fide value per se.

In short, there's no clear answer to its meaning as of yet.

I guess depending on how people interpret null values, then the strategy to use them in foreign keys (and to validate them) could be different.

Implementation of Partially Null Foreign Keys

The SQL-92 Standard defines (section 4.10.2) three different ways of matching composite foreign keys with nullable values:

  • Match SIMPLE: If any column of the composite foreign key is null, then the foreign key is accepted, stored, but not validated against the referenced table. This is usually the default mode databases offer. In the SQL-92 standard this mode is described but not named.

  • Match PARTIAL: If any column of the composite foreign key is null, then each non-null column is matched against the referenced table to check there is at least one row where the value is present. I've seen no database implementing this mode.

  • Match FULL: Partially null foreign keys are not accepted. Either the foreign key is fully null or fully not null. When null, there's no validation against the referenced table. When not null, it's fully validated against the referenced table. This is what I was expecting as a default behavior (in my blissful ignorance).

Well, I checked how 10 different databases implemented these modes and here's what I found:

Database Engine  Match SIMPLE  Match PARTIAL  Match FULL
---------------  ------------  -------------  ----------
Oracle 12c1      YES*1         NO             NO
DB2 10.5         YES*1         NO             NO
PostgreSQL 10    YES*1         NO             YES
SQL Server 2014  YES*1         NO             NO
MariaDB 10.3     YES*1         NO*2           NO*2
MySQL 8.0        YES*1         NO*2           NO*2
Sybase ASE 16    YES*1         NO             YES
H2 1.4           YES*1         NO             NO
Derby 10.13      YES*1         NO             NO
HyperSQL 2.3     YES*1         NO             YES

*1 This is the default mode.

*2 Accepted when creating a table, but ignored.

In short:

  • All tested databases behave the same way by default: they default to Match SIMPLE.

  • No database I tested supports Match PARTIAL. I guess it makes sense since I personally find little use for it. Moreover, it could become prohibitively expensive to perform partial validation on separate foreign key columns without creating all possible index combinations on the referenced table.

  • PostgreSQL implements Match FULL as well as Sybase ASE. That's great news! Surprisingly HyperSQL (this tiny database) does too.

Workaround to implement Match FULL

The good news is there's an rather easy workaround to implement Match FULL if you happen to need it, in any of the tested databases. Just add a table constraint that allows either all null columns, or all non-null. Something like:

create table farm (
  id int,
  region_id int,
  area_id int,
  name varchar(50),
  constraint fk_region_area foreign key (region_id, area_id)
    references quadrant (region, area),
  constraint fkfull_region_area check ( -- here's the workaround
    region_id is null and area_id is null or
    region_id is not null and area_id is not null)
);

insert into farm (id, region_id, area_id, name) values (5, 10, null, 'farm 1'); -- fails

insert into farm (id, region_id, area_id, name) values (6, 11, null, 'farm 2'); -- fails

insert into farm (id, region_id, area_id, name) values (7, 10, 125, 'farm 3'); -- succeeds

insert into farm (id, region_id, area_id, name) values (8, null, null, 'farm 4'); -- succeeds

It works pretty neat.

Finally, and as a very personal opinion, I would have expected Match FULL to be the default matching strategy. Maybe it's just that to me allowing (by default) foreign keys that do not point to other rows encourages errors in the applications that use the database.

I think that most developers will understand FULL easily, compared to SIMPLE. And PARTIAL is way more complicated, and potentially error prone. Just my opinion.

like image 192
The Impaler Avatar answered Sep 29 '22 12:09

The Impaler