Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is a good time to break normalization rules?

Please give your opinion of situations in which it is a good idea to NOT normalize. I just witnessed some heated discussions between an architech and DBA who insisted in which one was arguing the database was TOO normalized.

like image 859
donde Avatar asked Jan 14 '10 15:01

donde


1 Answers

The rule is normalize til it hurts, then denormalize til it works. (who said that?)

In general, I often denormalize when I have a lot of parent child relationships and I know I would often have to join to five or six large tables to get one piece of data (say the client id for instance) and will not need any of the information from the intermediate tables much of the time. If at all possible, I try to denormalize things that will not change frequently (such as id fields). But anytime you denormalize, you have to write triggers or some other process (but normally triggers if it isn't something that can be handled through a PK/FK relationship and cascading updates) to make sure the data stays in synch. If you fail to do this at the database level, then you will have data integrity problems and your data becomes useless. Do not think you can maintain the denormalization through the application code. This is a recipe for disaster, as database are updated often from places other than the application.

Denormalizing correctly can slow inserts, updates and deletes, especially if you need to do large batches of data. It may or may not improve select query speed depending on how you need to query the data. If you end up needing to do a lot of self-joins to get the data, it is possible you would have been better off not denormalizing. Never denormalize without testing to see if you have improved performance. Remember slowing inserts/updates/deletes will have an overall effect on the system when many users are using it. By denormalizing to fix one problem, you may be introducing a worse problem in the overall system. Don't just test the one query you are trying to speed up, test the performance of the whole system. You might speed up a query that runs once a month and slow down other qreries that run thousands of times a day.

Denormalizing is often done for data warehouses which are a special case as they are generally updated automatically on a schedule rather than one record at a time by a user. DBAs who specialize in data warehousing also tend to build them and they know how to avoid the data integrity issues.

Another common denormalizing technique is to create a staging table for data related to a complex report that doesn't need to be run with real time data. This is a sort of poor man's data warehouse and should never be done without a way to update the staging table on a schedule (As infrequently as you can get away with, this uses server resources that could be better spend elsewhere most of the time.) Often these types of table are updated when there are few users on the system and lag a full day behind the real time data. Don't consider doing this unless the query you are staging the data for is truly slow and cannot otherwise be optimized. Many slow queries can be optimized without denomalization as developers often use the easiest to understand rather than the most performant ways to select data.

like image 145
HLGEM Avatar answered Oct 13 '22 19:10

HLGEM