When creating a database structure, what are good guidelines to follow or good ways to determine how far a database should be normalized? Should you create an un-normalized database and split it apart as the project progresses? Should you create it fully normalized and combine tables as needed for performance?
You want to start designing a normalized database up to 3rd normal form. As you develop the business logic layer you may decide you have to denormalize a bit but never, never go below the 3rd form. Always, keep 1st and 2nd form compliant. You want to denormalize for simplicity of code, not for performance.
Normalization is useful when your data has varying scales and the algorithm you are using does not make assumptions about the distribution of your data, such as k-nearest neighbors and artificial neural networks. Standardization assumes that your data has a Gaussian (bell curve) distribution.
Some Good Reasons Not to Normalize Let's look at a few: Joins are expensive. Normalizing your database often involves creating lots of tables. In fact, you can easily wind up with what you think should be a simple query that spans five or 10 tables.
You want to start designing a normalized database up to 3rd normal form. As you develop the business logic layer you may decide you have to denormalize a bit but never, never go below the 3rd form. Always, keep 1st and 2nd form compliant. You want to denormalize for simplicity of code, not for performance. Use indexes and stored procedures for that :)
The reason not "normalize as you go" is that you would have to modify the code you already have written most every time you modify the database design.
There are a couple of good articles:
http://www.agiledata.org/essays/dataNormalization.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With