As a DBA for MS SQL 2000 and 2005, I regularly see giant select queries JOINing 7-10 or even more tables. I find, though, that there is a certain point past which performance tends to suffer, and the query becomes very difficult to debug and/or improve.
So is there a "rule of thumb" for when I should be considering other query methods, like temp tables to hold preliminary results? Or is there a point after which the SQL query optimizer just doesn't do a very good job of figuring out the best plan?
You can add up to 255 fields from as many as 32 tables or queries.
With SQL 6.5 and earlier the limit is 16 - regardless of which "version" of SQL you are running - i.e. Enterprise Edition allows no more than standard. With SQL 7.0 the limit is 256.
Once the database has too many tables, it becomes conceptually difficult to understand and manage. A schema with more than 250 tables is getting to the point of conceptual overload.
The limit is based on the size of internal structures generated for the parsed SQL statement. The limit is 32766 if CQE processed the select statement.
A lot of times you can alleviate the visual smell by creating helper views, I do not think there is a hard and fast rule of how many joins are considered bad.
Unlike procedural coding, breaking down SQL into little bits and pieces can result in inefficient queries.
SQL Optimiser will work just fine with tons of table joins, and if you hit an corner case, you can specify the join order or style using hints. In reality I think it is very rare to get queries that join more than say 10 tables, but it is quite feasible that this could happen in a reporting type scenario.
If you discover a situation where you have lots of joins AND have discovered that this particular query is a bottleneck AND you have all the correct indexes in place, you probably need to refactor. However, keep in mind that the large amount of joins may only be a symptom, not the root cause of the issue. The standard practice for query optimisation should be followed (look at profiler, query plan, database structure, logic etc.)
SQL Server uses tempdb anyway for merge joins, so there is usually no need to create temp table just to refactor a single SELECT query.
It's really depends on how big your tables are, even you only joining 2 tables together if it has 100M records, then that's gonna be a slow process anyway.
If you have X records in table a and Y records in table b, if you joining them together, you may get up to x*y records back, in that case the swap memory will be in use during in the process, that's gonna be slow, compare that, the small queries jonly use the CPU L2 cache which has the best performance.
However, if you feel really need to join a lot of tables to achieve the goal , I am suggesting your databases are over normalized, 3rd normalisation is working really well in most of scenario, don't try to spit the information too much , as it recognised to be inefficient for querying.
Yes, if necessary please create a table to cache the results from the heavy query, and updates the fields only when is necessary, or even only once a day.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With