Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How many joins are feasible in practice

This question might be more apt to programmers.stackexchange. If so, please migrate.

I am currently pondering the complexity of typical data models. Everybody knows that data models should be normalized, however on the other hand a normalized data model will require quite a few joins to reassemble the data later. And joins are potentially expensive operations, depending on the size of the tables involved. So the question I am trying to figure out, is how one would usually go about this tradeoff? I.e. in practice how many joins would you find acceptable in typical queries when designing a data model? This would be especially interesting when counting multiple joins in single queries.

As an example let's say we have users, who own houses, in which there are rooms, which have drawers, which contain items. Trivially normalizing this with tables for users, houses, rooms, drawers, and items in the sense explained above, would later require me to join five tables, when getting all the items belonging to a certain user. This seems like an awful lot of complexity to me.

Most likely the size of the tables would be involved, too. Joining five tables with little data is not as bad as three tables with millions of rows. Or is this consideration wrong?

like image 698
LiKao Avatar asked Jun 29 '12 06:06

LiKao


People also ask

How many JOINs are possible in SQL?

How many types of JOINs are there in SQL? There are four main types of JOINs in SQL: INNER JOIN, OUTER JOIN, CROSS JOIN, and SELF JOIN. However, remember that OUTER JOINS have two subtypes: LEFT OUTER JOIN and RIGHT OUTER JOIN.

How many join conditions are required?

If you want to PROPERLY join N table together, you need at least N-1 join conditions.

How many joining conditions do you need for 5 tables?

Four are needed. It is as simple as laying five balls out in a straight line and counting the gaps between them. Unless you are willing to put all of your data into one great big mess of a table, in which case you could use a CROSS JOIN.

Can you do 3 JOINs in SQL?

It is possible to use multiple join statements together to join more than one table at the same time. To do that you add a second INNER JOIN statement and a second ON statement to indicate the third table and the second relationship.


3 Answers

There're reasons for the Database Normalizations, and I've seen queries with more then 20 tables and sub-queries being joined together, working just fine for a long time. I do find the concept of normalization being a huge win, as it allows me to introduce new features to be added into the existing working applications without affecting the so-far working parts.

Databases comes with different features to make your life easier:

  • you can create views for the most commonly used queries (although this is not the only use case for views);
  • some RDBMS provides Common Table Expressions (CTE), that allow you to use named sub-queries and also recursive queries;
  • some RDBMS provides extension languages (like PL/SQL or PL/pgSQL), that allows you to develop your own functions to hide the complexity of your schema and use only API calls to operate your data.

A while back there was somehow related question on How does a SQL statement containing mutiple joins work? It might be worthwhile to look into it also.

Developing an application with a normalized database is easier, 'cos with proper approach you can isolate your schema via views/functions and make your application code being immune to the schema changes. If you'll go for the denormalized design, it might happen that design changes will affect a great deal of your code, as denormalized systems tend to be highly performance optimized at the cost of change possibilities.

like image 73
vyegorov Avatar answered Nov 05 '22 04:11

vyegorov


Normalizing databases is an art form in itself.
If you structure your joins correctly you will only be grabbing the columns needed.
It should be much faster to run a query with millions of records with multiple tables and just joining the needed fields then it would if you have say one or two tables with all the records. In the second example you are retrieving all of the data and sorting through it would be a coding nightmare.
MySQL is very good only retrieving the data requested.
Just because the query is long doesn't mean it is slower.
I have seen query statements well over 20 lines of code that were very fast.

Have faith on the query you write and if you don't write a test script try it yourself.

like image 30
The_asMan Avatar answered Nov 05 '22 03:11

The_asMan


A totally normalized data model has bigger cost in performance but is more resilient to change. A data model flat as a dime tuned for one query will perform much better but you will have to pay the price when the specs change.

So maybe the question is will the use of your data model (queries) change a lot? If not; don't normalize them only tune them for the specific queries (ask your DBA). Otherwise normalize and just by the query execution plan if you use to many joins, I can't give you a specific number.

like image 4
Hubert Avatar answered Nov 05 '22 05:11

Hubert