This question might be more apt to programmers.stackexchange. If so, please migrate.
I am currently pondering the complexity of typical data models. Everybody knows that data models should be normalized, however on the other hand a normalized data model will require quite a few joins to reassemble the data later. And joins are potentially expensive operations, depending on the size of the tables involved. So the question I am trying to figure out, is how one would usually go about this tradeoff? I.e. in practice how many joins would you find acceptable in typical queries when designing a data model? This would be especially interesting when counting multiple joins in single queries.
As an example let's say we have users, who own houses, in which there are rooms, which have drawers, which contain items. Trivially normalizing this with tables for users, houses, rooms, drawers, and items in the sense explained above, would later require me to join five tables, when getting all the items belonging to a certain user. This seems like an awful lot of complexity to me.
Most likely the size of the tables would be involved, too. Joining five tables with little data is not as bad as three tables with millions of rows. Or is this consideration wrong?
How many types of JOINs are there in SQL? There are four main types of JOINs in SQL: INNER JOIN, OUTER JOIN, CROSS JOIN, and SELF JOIN. However, remember that OUTER JOINS have two subtypes: LEFT OUTER JOIN and RIGHT OUTER JOIN.
If you want to PROPERLY join N table together, you need at least N-1 join conditions.
Four are needed. It is as simple as laying five balls out in a straight line and counting the gaps between them. Unless you are willing to put all of your data into one great big mess of a table, in which case you could use a CROSS JOIN.
It is possible to use multiple join statements together to join more than one table at the same time. To do that you add a second INNER JOIN statement and a second ON statement to indicate the third table and the second relationship.
There're reasons for the Database Normalizations, and I've seen queries with more then 20 tables and sub-queries being joined together, working just fine for a long time. I do find the concept of normalization being a huge win, as it allows me to introduce new features to be added into the existing working applications without affecting the so-far working parts.
Databases comes with different features to make your life easier:
A while back there was somehow related question on How does a SQL statement containing mutiple joins work? It might be worthwhile to look into it also.
Developing an application with a normalized database is easier, 'cos with proper approach you can isolate your schema via views/functions and make your application code being immune to the schema changes. If you'll go for the denormalized design, it might happen that design changes will affect a great deal of your code, as denormalized systems tend to be highly performance optimized at the cost of change possibilities.
Normalizing databases is an art form in itself.
If you structure your joins correctly you will only be grabbing the columns needed.
It should be much faster to run a query with millions of records with multiple tables and just joining the needed fields then it would if you have say one or two tables with all the records.
In the second example you are retrieving all of the data and sorting through it would be a coding nightmare.
MySQL is very good only retrieving the data requested.
Just because the query is long doesn't mean it is slower.
I have seen query statements well over 20 lines of code that were very fast.
Have faith on the query you write and if you don't write a test script try it yourself.
A totally normalized data model has bigger cost in performance but is more resilient to change. A data model flat as a dime tuned for one query will perform much better but you will have to pay the price when the specs change.
So maybe the question is will the use of your data model (queries) change a lot? If not; don't normalize them only tune them for the specific queries (ask your DBA). Otherwise normalize and just by the query execution plan if you use to many joins, I can't give you a specific number.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With