Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Enforcing supertype & subtype data integrity without stored procedures, triggers, or UDFs

I run a small food production business and I need to manage customer orders. I have built a conceptual data model of this aspect of my business but I need some pointers on how to fully implement this in an RDMS.

As a first step, I have come up with the logical model given below. My knowledge of data modeling is limited, so there may be errors in my diagram but hopefully it conveys my intention. Note that this is merely a simplified portion of a larger schema and I'm presenting only the relevant tables for the sake of simplicity.

database schema diagram

The data model in brief

  • A customer Ordr can have one or more OrdrItems
  • An OrdrItem can be either a FoodItem or a ComboItem
  • A ComboItem is a logical grouping of two or more FoodItems

I have implemented the above schema in MySQL and I have written a few small programs to populate the tables with customer orders. This gets the job done but without much thought towards data integrity. I have noticed that with this implementation some data integrity rules are not enforced at the database level.

For example, FoodItem is a subtype of OrdrItem. For each row in FoodItem there must be exactly one corresponding row in OrdrItem. However, in its current implementation, I can delete a row from FoodItem thus leaving a row in OrdrItem without a corresponding row in one of the subtype tables. This should be disallowed.

Some further data constraints

  • An order must have at least one associated "order item" (i.e., an order cannot be empty)
  • An order item must be of exactly one associated subtype of either FoodItem or ComboItem (i.e., an OrderItem cannot be both a FoodItem and ComboItem).
  • Some further constraints will probably arise in the future

I would like these data integrity rules to be baked into the database, such that I don't have to concern myself with their enforcement in every new client application, throw-away script or shoddy SQL statement that I write. I suspect that if I don't make these guarantees at the database level, I greatly increase my chances of running into data integrity issues down the road.

The question

I have only the faintest knowledge of stored procedures, triggers, and user-defined functions. I get the impression that some or all of these features can help me achieve what I want. However, if I can get the job done with check constraints, foreign keys and relatively simple features alone, I'll gladly go that route. Basically, I want to limit complexity as much as possible and not pull in every nifty database feature if it isn't warranted. Is it possible to ensure the data integrity I want without resorting to stored procedures, triggers, user-defined functions and other more esoteric database features?

I'm willing to use either MySQL or Postgresql to implement my solution as I have a basic working knowledge of both systems.

Finally, if this kind of approach towards data integrity is considered overkill, or if there is a far more pragmatic but slightly imperfect solution, I'm open to that as well.

like image 281
ntwk Avatar asked Nov 02 '22 01:11

ntwk


1 Answers

Unfortuntately, "modern" DBMSes don't directly support all of the fancy symbols1 that you can put into an ER diagram. All that the physical FOREIGN KEY actually enforces is that the child row cannot exist without parent, which gets you a plain "1 to 0 or N" relationship2.

You can...

  • make FK NULL-able to morph the left side of the relationship to "0 or 1"
  • and/or you can put a key on top of FK to morph the right side to "0 or 1"

...but that's about all you can do "out of box".

To enforce other rules3, you'll either have to significantly "uglify" the model and probably employ deferred constraints4, or you can do it do it in procedural code5.

While your instincts about putting as much as possible of the integrity rules in the database itself are spot on, it is nonetheless considered a lesser evil to just enforce "unusual" cases in procedural code instead of twisting the data model into a pretzel to accommodate the limitations of declarative constraints.

In fact, one of the most popular techniques is to create an "API":

  • forbid the clients from modifying the tables directly (by revoking the appropriate permissions)
  • and allow them to only modify the data through stored procedures that you wrote and which enforce all the necessary business rules6. That way, you funnel all the clients through the same "clearing house" and nobody can misbehave.

That's a rather "heavy-duty" solution though, and may not be worth the trouble if the scenario is simple enough. If your application is the only one that is ever going to modify the database, then just implementing the rules in the client code may be enough...


1 Such as inheritance (aka. subtype, category, generalization hierarchy).

2 Left side: any given child must have "1" parent. Right side: any given parent must have "0 or N" children.

3 Such as exclusivity and presence of children, which is important for inheritance, as you already noted.

4 Which are supported by PostgreSQL but not MySQL.

5 In order of preference:

  • triggers and stored procedures
  • middle-tier
  • or client.

6 But be wary of race conditions: transaction isolation will protect you from some of them, but nut not all, and you may need to do some explicit locking.

like image 91
Branko Dimitrijevic Avatar answered Nov 09 '22 07:11

Branko Dimitrijevic