Any good literature on join performance vs systematic denormalization?

Tags:

As a corollary to this question I was wondering if there was good comparative studies I could consult and pass along about the advantages of using the RDMBS do the join optimization vs systematically denormalizing in order to always access a single table at a time.

Specifically I want information about :

Performance or normalisation versus denormalisation.
Scalability of normalized vs denormalized system.
Maintainability issues of denormalization.
model consistency issues with denormalization.

A bit of history to see where I am going here : Our system uses an in-house database abstraction layer but it is very old and cannot handle more than one table. As such all complex objects have to be instantiated using multiple queries on each of the related tables. Now to make sure the system always uses a single table heavy systematic denormalization is used throughout the tables, sometimes flattening two or three levels deep. As for n-n relationship they seemed to have worked around it by carefully crafting their data model to avoid such relations and always fall back on 1-n or n-1.

End result is a convoluted overly complex system where customer often complain about performance. When analyzing such bottle neck never they question these basic premises on which the system is based and always look for other solution.

Did I miss something ? I think the whole idea is wrong but somehow lack the irrefutable evidence to prove (or disprove) it, this is where I am turning to your collective wisdom to point me towards good, well accepted, literature that can convince other fellow in my team this approach is wrong (of convince me that I am just too paranoid and dogmatic about consistent data models).

My next step is building my own test bench and gather results, since I hate reinventing the wheel I want to know what there is on the subject already.

---- EDIT Notes : the system was first built with flat files without a database system... only later was it ported to a database because a client insisted on the system using Oracle. They did not refactor but simply added support for relational databases to existing system. Flat files support was later dropped but we are still awaiting refactors to take advantages of database.

580

asked Aug 02 '09 08:08

Newtopian

1 Answers

a thought: you have a clear impedence mis-match, a data access layer that allows access to only one table? Stop right there, this is simply inconsistent with optimal use of a relational database. Relational databases are designed to do complex queries really well. To have no option other than return a single table, and presumably do any joining in the bausiness layer, just doesn't make sense.

For justification of normalisation, and the potential consistency costs you can refer to all the material from Codd onwards, see the Wikipedia article.

I predict that benchmarking this kind of stuff will be a never ending activity, special cases will abound. I claim that normalisation is "normal", people get good enough performance fro a clean database deisgn. Perhaps an approach might be a survey: "How normalised is your data? Scale 0 to 4."

answered Sep 22 '22 22:09

djna

Related questions
                            
                                Right join without intersection and another condition
                            
                                Compare 2 large sql files and find differences to recover data
                            
                                Creating indexes on columns where primary key or unique key constraints already present
                            
                                Convert bytea to double precision in PostgreSQL
                            
                                Microsoft Visual Studio 2017 Mac: Where is server explorer?
                            
                                Can I retrieve paged rows and total count within a single query?
                            
                                SQL - How to select rows which have the same multiple values
                            
                                BigQuery - cannot UNION String and NULL
                            
                                Hierarchical JSON Data to Hierarchical Table
                            
                                SQL subtracting results of 2 select statement
                            
                                How to express this with jOOQ "Select alias.*, otherAlias.Column From .."
                            
                                Row level security for groups or Making rows accebile to groups
                            
                                BigQuery Merge - Insert new rows if not mached
                            
                                How to get the same percent_rank in SQL and pandas?
                            
                                SQLAlchemy - Adding a ForeignKeyConstraint to a many-to-many table that is based on another relationship
                            
                                How to test SQL for validity from the command line?
                            
                                PHP & Mysql updating many to many relationship with checkbox form
                            
                                Convert TSQL to MS-Access SQL
                            
                                RESTRICTED_USER
                            
                                If you could only monitor 10 SQL Server 2005 counters, what would they be? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Any good literature on join performance vs systematic denormalization?

Tags:

sql

denormalization

legacy-code

premature-optimization

Newtopian

People also ask

1 Answers

djna

Recent Activity

Donate For Us