When would a database design be described as overnormalized? Is this characterization an absolute one? Or is it dependent on the way it is used in the application? Thanks.
It depends on the algorithm. For some algorithms normalization has no effect. Generally, algorithms that work with distances tend to work better on normalized data but this doesn't mean the performance will always be higher after normalization.
Normalization means minimizing redundancy in stored data. Instead you setup relationships (often with foreign constraints) between multiple tables. However, while normalization might lead to a smaller amount of stored data, often it creates performance problems because now many queries end up joining multiple tables.
Normalization reduces complexity overall and can improve querying speed. Too much normalization, however, can be just as bad as it comes with its own set of problems. I've worked at several companies and I've seen both first hand and it's a pain when it's done wrong and its an early day when it's done correctly.
In a normalized database, such queries would need to join the Users and Categories tables. To improve database performance and avoid such joins, we can add a primary or unique key from the Users table directly to the Messages table.
In the general sense, I think that overnormalized is when you are doing so many JOINs to retrieve data that it is causing notable performance penalties and deadlocks on your database, even after you've tuned the heck out of your indexes. Obviously, for huge applications and sites like MySpace or eBay, de-normalization is a scaling requirement.
As a developer for several small businesses, I tell you that in my experience it's always been easier to go from normalized -> denormalized than the other way around, and in fact going the other way around (to avoid duplication of data now that the business requirements have changed a year or so later) is much more difficult.
When I read general statements such as "you should put the address in your customers table instead of a separate address table so you can avoid the join", I shudder, because you just know that a year from now somebody's going to ask you to do something with addresses that you totally didn't foresee, like maintaining an audit trail, or storing multiple per customer. If your database allows you to create an indexed view, you can sidestep that issue until you get to the point where your dataset is so large that it can't possibly exist or be served by a single server or set of servers in a 1-write, many-read environment. For most of us, I don't think that scenario happens very often.
When in doubt, I aim for third normal form with some exceptions (for example, having a field contain a CSV-list of separated strings because I know I'll never ever look at the data from the other angle). When I need to consolidate, I'll look at my views or indexes first. Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With