I am building a system that is a central repository for storing data from a number of other systems. A sync process is required to update the central repository when the other systems data is updated. There will be a sync_action table to identify which system the central repo needs to sync with and the type of sync required. There are set of defined actions that is very unlikely to change. A slimmed down system is below.
As I see it I can approach this in two ways:
Option 1) Have an Action
table that has the 3 actions available. Have a sync_action
table which uses a foreign key to reference the actions required.
Table: System
ID Description
1 Slave System 1
2 Slave System 2
Table: Action
ID Description
1 Insert
2 Update
3 Delete
Table: Sync_action
ID Action System
1 1 1
2 2 1
Option 2) Instead of a foreign key use a check constraint on the sync_action.action
column so only the actions Insert/Update/Delete
can be inserted.
Table: Sync_action
ID Action System
1 Insert 1
2 Update 1
I would like to know what factors go into determining which is a better approach when deciding between integrity constraints, foreign key vs check constraint. There have been similar threads but I didn't find them definitive enough. This may be because its up to interpretation but any thoughts would be appreciated.
Cheers
The commentators seems to umanimously agree:
It's generally better to have a FOREIGN KEY
constraint to a (more or less static) reference table. Reasons:
The constraint is easily "extendable". To add or remove an option, you only have to add or remove a row from the refernce table. You don't have to drop the constraint and recreate it. Even more, if you have same constraint in similar columns in other tables, too.
You can have extra information attached (more columns), that can be read by the applications if needed.
ORMs can deal better with (Read: be aware of) these constraints. They just have to read a table, not the meta-data.
If you want to change the Action codes, the cascading effects will take care of the changes in other (possibly many) tables. No need to write UPDATE queries.
One particular DBMS has not yet implemented CHECK
constraints (shame), although it does have FK ones.
As @pst mentioned (and I prefer this approach very much), you can use a sensible code instead of a surrogate integer ID. So, your table could be:
Table: System
SystemID Description
1 Slave System 1
2 Slave System 2
Table: Action
ActionCode Description
I Insert
U Update
D Delete
Table: SyncAction
ID ActionCode SystemID
1 I 1
2 U 1
I think you're confusing the difference between a foreign key constraint and a check constraint.
A foreign key constraint is there to enforce referential integrity and a check constraint constrains a column to containing only valid data. In your case this may seem like a minor difference but if we abstract it slightly I hope to make it clearer.
If we consider a table users
with the columns user_id, user_name, address_id, join_date, active, last_active_month
; I recognise that this is not necessarily the best way of doing things but it'll serve for the point I'm trying to make.
In this case it's patently ridiculous to have address_id
as a constraint. This column could have any number of values. However, active
, assuming we want a boolean y/n
can only have two possible values and last_active_month
can only have 12 possible values. In both these cases it's completely ridiculous to have a foreign key. There are only a certain number of values and by the definition of the data you are including these values cannot change.
In your case, while you could go for a check constraint, unless you can be absolutely certain that the number of actions
will never change a foreign key is the correct way to go.
On a slightly separate matter, and as @pst mentioned, I see you've been eaten by the surrogate key monster. While this can result in performance improvements, in a table of the size you're envisaging ( 3 values, insert / update / delete
) or even a larger one all it serves to do is obscure what you're trying to achieve.
It's not easy to look at
ID Action System
1 1 1
2 2 1
and see what's going on, but:
ID Action System
1 insert 1
2 update 1
is far easier to read; you may also want to consider doing the same for the system
column - I probably would, though the number of possible values jumps slightly in this. Just my personal thoughts on the matter...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With