Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any good literature on join performance vs systematic denormalization?

As a corollary to this question I was wondering if there was good comparative studies I could consult and pass along about the advantages of using the RDMBS do the join optimization vs systematically denormalizing in order to always access a single table at a time.

Specifically I want information about :

  • Performance or normalisation versus denormalisation.
  • Scalability of normalized vs denormalized system.
  • Maintainability issues of denormalization.
  • model consistency issues with denormalization.

A bit of history to see where I am going here : Our system uses an in-house database abstraction layer but it is very old and cannot handle more than one table. As such all complex objects have to be instantiated using multiple queries on each of the related tables. Now to make sure the system always uses a single table heavy systematic denormalization is used throughout the tables, sometimes flattening two or three levels deep. As for n-n relationship they seemed to have worked around it by carefully crafting their data model to avoid such relations and always fall back on 1-n or n-1.

End result is a convoluted overly complex system where customer often complain about performance. When analyzing such bottle neck never they question these basic premises on which the system is based and always look for other solution.

Did I miss something ? I think the whole idea is wrong but somehow lack the irrefutable evidence to prove (or disprove) it, this is where I am turning to your collective wisdom to point me towards good, well accepted, literature that can convince other fellow in my team this approach is wrong (of convince me that I am just too paranoid and dogmatic about consistent data models).

My next step is building my own test bench and gather results, since I hate reinventing the wheel I want to know what there is on the subject already.

---- EDIT Notes : the system was first built with flat files without a database system... only later was it ported to a database because a client insisted on the system using Oracle. They did not refactor but simply added support for relational databases to existing system. Flat files support was later dropped but we are still awaiting refactors to take advantages of database.

like image 580
Newtopian Avatar asked Aug 02 '09 08:08

Newtopian


People also ask

Does join improve performance?

Even though the join order has no impact on the final result, it still affects performance. The optimizer will therefore evaluate all possible join order permutations and select the best one. That means that just optimizing a complex statement might become a performance problem.

How denormalization improves the performance?

Denormalization can improve performance by: Minimizing the need for joins. Precomputing aggregate values, that is, computing them at data modification time, rather than at select time. Reducing the number of tables, in some cases.

Which is faster normalization or denormalization?

Normalization uses optimized memory and hence faster in performance. On the other hand, Denormalization introduces some sort of wastage of memory. Normalization maintains data integrity i.e. any addition or deletion of data from the table will not create any mismatch in the relationship of the tables.

What is the drawback of denormalization?

Disadvantages of DenormalizationUpdates and inserts are more expensive. If a piece of data is updated in one table, all values duplicated in other tables need to be updated as well. Similarly, when inserting new values, we need to store data both in the normalized table and in the denormalized table.


1 Answers

a thought: you have a clear impedence mis-match, a data access layer that allows access to only one table? Stop right there, this is simply inconsistent with optimal use of a relational database. Relational databases are designed to do complex queries really well. To have no option other than return a single table, and presumably do any joining in the bausiness layer, just doesn't make sense.

For justification of normalisation, and the potential consistency costs you can refer to all the material from Codd onwards, see the Wikipedia article.

I predict that benchmarking this kind of stuff will be a never ending activity, special cases will abound. I claim that normalisation is "normal", people get good enough performance fro a clean database deisgn. Perhaps an approach might be a survey: "How normalised is your data? Scale 0 to 4."

like image 72
djna Avatar answered Sep 22 '22 22:09

djna