Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why would you store a delimited list in a SQL text column?

I have to maintain an application that has a lot of columns that are of the text data type, with multiple values inserted into them delimited with commas, slashes or sometimes even the pipe (|) character. I'm trying to figure out why on Earth you would ever want to do this.

For an example, an orders table has a column called details that contains information like:

2x #ABC-12345  Widget, Black: $24.99 /4x #ABC-12344 Widget, Blue: $23.50

where the / separates the line items; there is VBScript code which reads the value from a recordset and parses it out in a For loop for display using something like (and this is pretty much exactly how the code reads, variable names and all) arydtls = split(rstmp("details"), "/"). This method is repeated throughout the code for various tables.

It seems to me like it would be 100x better (not to mention easier to work with) to just have the details in a separate table and link back to it (funnily enough, for Orders it does do this, but the data doesn't always match the details text field because the OrderDetail table is updated in code; the details field is treated as read-only in the application).

Did my predecessor know something I didn't, or am I right to be saying "WTF?!!" when I look at this schema? It seems like it's insanely inefficient and difficult to maintain like this, and it makes running reports extra difficult because the data I need could be contained in text fields OR it might be in one of a dozen tables that have similar information and are used in different parts of the application.

like image 963
Wayne Molina Avatar asked Feb 10 '09 16:02

Wayne Molina


3 Answers

The two likeliest scenarios are:

  • Your predecessor was incompetent / didn't understand normalization
  • Your predecessor ran into some performance problems with the normalized structure and found this method was an improvement

Since normalization can often be very expensive when it comes to query operations, we can sometimes get performance gains by eliminating an expensive join and doing the manipulations on the application side against a single row.

There is no absolute rule for database design that says "storing delimited values in a single row is better for this scenario". It's all about testing against your specific data sets and your usage patterns and making improvements where necessary.

In my experience it's not very common for this pattern to be an improvement over normalization though... that's pretty atypical.

Edit: A third possibility is that having n-values per row was a change from the original schema, and instead of adding a new table your predecessor resized the column. That's not necessarily different from the "incompetent" option :) but there are sometimes political pressures involved in db schema changes...

like image 115
Rex M Avatar answered Sep 28 '22 02:09

Rex M


Did my predecessor know something I didn't, or am I right to be saying "WTF?!!" when I look at this schema?

No, your predecessor didn't. Yes, you are right. See note at the end, however.

It seems like it's insanely inefficient and difficult to maintain like this, and it makes running reports extra difficult because the data I need could be contained in text fields OR it might be in one of a dozen tables that have similar information and are used in different parts of the application.

It is insanely inefficient. See note at the end, however.

A column should always be an indivisible attribute of the row. I see two copies of three (maybe four) attributes in this column you've shown:

2x #ABC-12345  Widget, Black: $24.99 /4x #ABC-12344 Widget, Blue: $23.50
  • quanity (2x/4x).
  • code (#ABC-12345/#ABC-12344).
  • description (Widget, Black:/Widget, Blue:) [may be description and color attributes].
  • price ($24.99/$23.50).

This would have been better designed as:

StockItems
    Code char(10) primary key
    Desc varchar(50)
Transaction
    TxnId something primary key
    : : :
TransactionPart
    TxnId something \
    TxnSeq int      / primary key
    Quantity integer
    Code char(10) foreign key StockItems(Code)
    Price float

NOTE:

It's possible that this was done to preserve historical information in the face of changing values elsewhere in the database. For example, if the description of a stock item changes or the item is deleted.

However, that's still not the right way to handle it. In that case, foreign key constraints would have stopped the item code from being deleted and processes should have been in place to prevent the description from being updated (such as versioning the stock item codes).

Of course, if you're never going to search on any of the items within that column, this is perfectly valid, though unwise in terms of possible future functionality to search on them.

Maybe the only thing ever searched on in this table is the customer code - then a free-format text field is adequate.

I still wouldn't do it that way but a YAGNI argument can be made that it would be better to change the DB schema in the future, if and when that search functionality needs to be added.

like image 23
paxdiablo Avatar answered Sep 28 '22 02:09

paxdiablo


Quite simply he either had a reason or he didn't, without asking its impossible to know. If you make the assumption that he wasn't a total idea and thing of some possible reasons then maybe its one of the following.

If the data was for information only and "never going to change" as you hear so often then it may have been a quick win just to throw a display string straight to the field. After all, just replacing pipes with Tabs and slashes with BR's to put it on the screen is incredibly easy. If the code had to be written extremely quickly then this might've been the easy option.

A new feature since SQL 2005 is the XML data type. A major use of this is that you can store and index an unknown number of values against a particular record. You might care about the colour of one thing, the dimensions of another, the weight of something else. You might not be able to produce a definitive list of these things and a truly normalised generic method of storing this data may be too slow or overcomplicated for the system. This may have been a work around to try and get similar functionality.

The key thing here is, most things are done for a reason. You've looked at it the right way in trying to find out this reason. You might come accross it one day and think "Oh yeah!". Just looking at something from your own perspective can often lead to a can't see the wood for the trees scenario.

like image 37
Robin Day Avatar answered Sep 28 '22 02:09

Robin Day