Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is using a common-lookup table to restrict the status of entity wrong?

Tags:

According to Five Simple Database Design Errors You Should Avoid by Anith Sen, using a common-lookup table to store the possible statuses for an entity is a common mistake.

Edit + Answer: The figures in Anith's article aren't well labelled - I thought both Figure 1 and Figure 2 are examples of bad design, whereas Figure 2 is good design. Phew, got worried there for a moment.

In summary:

  • Lookup tables: good.
  • Common-lookup tables: bad.

I'll keep my question below for reference.


The following reasons are given:

  1. "You lose the means to ensure accurate data; constraints. By combining different entities into a single table, you have no declarative means to restrain values of a certain category."
    How is constraining values losing accuracy?

  2. "You are forced to represent every data type as a string with this type of generic lookup table."
    If I want to represent another data type, I can add a column for it to my lookup table.

  3. "You commit yourself to rigidity and subsequent complexity."
    How?

  4. Fourthly and finally, you are faced with the physical implementation issues.
    I don't see why.

I disagree with most of the reasons given and would like some objective critique on my incorrect? logic.

My examples:

Citing the example of jobs at a repair service with many possible statuses that generally have a natural flow, let's take a JobStatus table:

  1. Booked In
  2. Assigned to Technician
  3. Diagnosing problem
  4. Waiting for Client Confirmation
  5. Repaired & Ready for Pickup
  6. Repaired & Couriered
  7. Irreparable & Ready for Pickup
  8. Quote Rejected

Arguably, some of these statuses can be normalised to tables like Couriered Items, Completed Jobs and Quotes (with Pending/Accepted/Rejected statuses), but that feels like unnecessary schema complication.

Another common example would be an OrderStatus table to restrict the status of an order:

  1. Pending
  2. Completed
  3. Shipped
  4. Cancelled
  5. Refunded

The status titles and descriptions are in one place for editing and are easy to scaffold as a drop-down with a foreign key for dynamic data applications. This has worked well for me in the past. If the business rules dictate the creation of a new order status, I can just add it to OrderStatus table, without rebuilding my code.

Why is this a bad practice?


Edits: I added Anith's reason to my question and tried to remain objective.

--

like image 939
Petrus Theron Avatar asked Apr 22 '10 14:04

Petrus Theron


People also ask

What is the purpose of a lookup table?

Lookup tables provide a way to add extra information in an event. A lookup table consists of a list of keys and values. You define a lookup table using the table function, and access the table using the lookup function. You can create a lookup table in the rules file or in a separate file.

When designing a database Why is it not preferable to store all data in one table?

Answer and Explanation: Storing all data in one single table will be confusing, may have security issues and there will be duplication in recording.

Should I use lookup table?

Lookup tables provide the essential function of helping you maintain data integrity in your database environment. For example, if you have users entering their gender into a data item, the table that contains the Gender item can reference a lookup table to verify that only the value M or F is used.

What is the use of lookup in Oracle?

A lookup associates values used by one application for a specific field to the values used by other applications for the same field. This provides the capability to map values across vocabularies or systems. For example, you can map country codes, city codes, currency codes, and so on.


2 Answers

What Anith Sen is advising against is having a single lookup table for all lookup codes. That is the significance of the category column in his example. Having a separate table for each category is definitely the way to go.

This is because:

  1. we can use the look-up tables to restrict values through enforce foreign keys
  2. it makes it easier for the database to optimize queries which join data tables with look-up tables
  3. it scales better: one large lookup category can really skew performance

In your examples JobStatus and OrderStatus are separate categories. applicable to separate entities. That is why they need different look-up tables. There is not even a problem with sharing the same code table across several different data tables. Where it becomes problematic is when we have separate data tables (entities) for which some statuses are not appropriate: that is the time to split the codes out into separate look-up tables.

edit

I see you have edited your post to cite all of Anith's points. I think the most important point one is the first one, regarding constraints. If you want to restrict the ORDERS.STATUS column to have values from the OrderStatus category then you have to have a separate table to enforce a foreign key. Your alternatives are:

  • include a CodeCategory column on the ORDERS table and enforce a compound foreign key against the common CODES table, which now needs a unique key of (Category,Code).
  • duplicate the OrderStatus values in a check constraint
  • don't enforce the values in the database, and rely on the application's drop-down list to restrict the values.

All of those options suck, from the database perspective.

like image 61
APC Avatar answered Sep 19 '22 14:09

APC


You already have a correct answer, so this remark is extra.

The big problem with OTLT (One true lookup table) is that you end up placing values from different domains in the same column, and then using a separate column to disambiguate.

In your examples, you have used numbers alongside each status description. If those numbers are numeric codes, as I think they must be, then you do not want the value 4, meaning "waiting for client confirmation" in the same column as the value 4, meaning "Cancelled". If you do this, then you can't use this column as the PK for your one true lookup table.

If you give your one true lookup table another column, call it "CodeType" and use "CodeType" and "Code" as a compound PK, you have introduced more complexity than you introduce by having a separate lookup table for each code type.

The short answer is this: don't put values from different domains in the same column. It always causes more trouble than it saves.

Incidentally, it's possible to create a view that combines all the separate lookup tables into what appears to be a single giant lookup table. This can be useful in certain very unusual situations.

like image 36
Walter Mitty Avatar answered Sep 19 '22 14:09

Walter Mitty