I have to maintain an application that has a lot of columns that are of the text data type, with multiple values inserted into them delimited with commas, slashes or sometimes even the pipe (|) character. I'm trying to figure out why on Earth you would ever want to do this.
For an example, an orders table has a column called details that contains information like:
2x #ABC-12345 Widget, Black: $24.99 /4x #ABC-12344 Widget, Blue: $23.50
where the /
separates the line items; there is VBScript code which reads the value from a recordset and parses it out in a For
loop for display using something like (and this is pretty much exactly how the code reads, variable names and all) arydtls = split(rstmp("details"), "/")
. This method is repeated throughout the code for various tables.
It seems to me like it would be 100x better (not to mention easier to work with) to just have the details in a separate table and link back to it (funnily enough, for Orders it does do this, but the data doesn't always match the details text field because the OrderDetail table is updated in code; the details field is treated as read-only in the application).
Did my predecessor know something I didn't, or am I right to be saying "WTF?!!" when I look at this schema? It seems like it's insanely inefficient and difficult to maintain like this, and it makes running reports extra difficult because the data I need could be contained in text fields OR it might be in one of a dozen tables that have similar information and are used in different parts of the application.
The two likeliest scenarios are:
Since normalization can often be very expensive when it comes to query operations, we can sometimes get performance gains by eliminating an expensive join and doing the manipulations on the application side against a single row.
There is no absolute rule for database design that says "storing delimited values in a single row is better for this scenario". It's all about testing against your specific data sets and your usage patterns and making improvements where necessary.
In my experience it's not very common for this pattern to be an improvement over normalization though... that's pretty atypical.
Edit: A third possibility is that having n-values per row was a change from the original schema, and instead of adding a new table your predecessor resized the column. That's not necessarily different from the "incompetent" option :) but there are sometimes political pressures involved in db schema changes...
Did my predecessor know something I didn't, or am I right to be saying "WTF?!!" when I look at this schema?
No, your predecessor didn't. Yes, you are right. See note at the end, however.
It seems like it's insanely inefficient and difficult to maintain like this, and it makes running reports extra difficult because the data I need could be contained in text fields OR it might be in one of a dozen tables that have similar information and are used in different parts of the application.
It is insanely inefficient. See note at the end, however.
A column should always be an indivisible attribute of the row. I see two copies of three (maybe four) attributes in this column you've shown:
2x #ABC-12345 Widget, Black: $24.99 /4x #ABC-12344 Widget, Blue: $23.50
This would have been better designed as:
StockItems
Code char(10) primary key
Desc varchar(50)
Transaction
TxnId something primary key
: : :
TransactionPart
TxnId something \
TxnSeq int / primary key
Quantity integer
Code char(10) foreign key StockItems(Code)
Price float
NOTE:
It's possible that this was done to preserve historical information in the face of changing values elsewhere in the database. For example, if the description of a stock item changes or the item is deleted.
However, that's still not the right way to handle it. In that case, foreign key constraints would have stopped the item code from being deleted and processes should have been in place to prevent the description from being updated (such as versioning the stock item codes).
Of course, if you're never going to search on any of the items within that column, this is perfectly valid, though unwise in terms of possible future functionality to search on them.
Maybe the only thing ever searched on in this table is the customer code - then a free-format text field is adequate.
I still wouldn't do it that way but a YAGNI argument can be made that it would be better to change the DB schema in the future, if and when that search functionality needs to be added.
Quite simply he either had a reason or he didn't, without asking its impossible to know. If you make the assumption that he wasn't a total idea and thing of some possible reasons then maybe its one of the following.
If the data was for information only and "never going to change" as you hear so often then it may have been a quick win just to throw a display string straight to the field. After all, just replacing pipes with Tabs and slashes with BR's to put it on the screen is incredibly easy. If the code had to be written extremely quickly then this might've been the easy option.
A new feature since SQL 2005 is the XML data type. A major use of this is that you can store and index an unknown number of values against a particular record. You might care about the colour of one thing, the dimensions of another, the weight of something else. You might not be able to produce a definitive list of these things and a truly normalised generic method of storing this data may be too slow or overcomplicated for the system. This may have been a work around to try and get similar functionality.
The key thing here is, most things are done for a reason. You've looked at it the right way in trying to find out this reason. You might come accross it one day and think "Oh yeah!". Just looking at something from your own perspective can often lead to a can't see the wood for the trees scenario.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With