Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sql query joins multiple tables - too slow (8 tables)

People also ask

How many joining conditions do you need for 10 tables?

You can join based on the common column even if there is not Foreign key relationship set up between tables. I think, theoretically, 9! relations are possible between 10 tables, but this is just considering relations between tables (not based on different columns between tables) as it will make that number much bigger.

How many joining conditions do you need for 5 tables?

Four are needed. It is as simple as laying five balls out in a straight line and counting the gaps between them. Unless you are willing to put all of your data into one great big mess of a table, in which case you could use a CROSS JOIN.

Why do joins slow down queries?

Joins: If your query joins two tables in a way that substantially increases the row count of the result set, your query is likely to be slow. There's an example of this in the subqueries lesson. Aggregations: Combining multiple rows to produce a result requires more computation than simply retrieving those rows.


I had a similar problem with several lookup tables joining to a large table with all id fields indexed. To monitor the effect of the joins on query time execution, I ran my query several times (limiting to first 100 rows), adding a Join to an additional table each time. After joining 12 tables, there was no significant change in query execution time. By the time I had joined the 13th table the execution time jumped to a 1 second; 14th table 4 seconds, 15th table 20 s, 16th 90 seconds.

Keijro's suggestion to use a correlated subqueries instead of joins e.g.

SELECT t1_id, 
        (select t2_name from t2 where t1_id = t2_id), 
        (select t3_name from t3 where t1_id = t3_id), 
        (select t4_name from t4 where t1_id = t4_id), 
        (select t5_name from t5 where t1_id = t5_id), 
        (select t6_name from t6 where t1_id = t6_id), 
        (select t7_name from t7 where t1_id = t7_id), 
        (select t8_name from t8 where t1_id = t8_id), 
        (select t9_name from t9 where t1_id = t9_id)  FROM t1

improved query performance dramatically. In fact the subqueries did not seem to lengthen the time to execute the query (the query was almost instanteous).

I am a little suprised as I thought correlated subqueries perform worse than joins.


Depending on how much data is in the tables, you may need to place indexes on the columns that are being joined against. Often slow querying speed comes down to lack of an index in the right place.

Also:

LEFT JOINs are slower than INNER JOINs (though this is dependent on what you're doing exactly) - can you accomplish what you're looking for with inner joins?


It would help a bit if you could post the explain plan of the query.

But, first of all, you have indexes on all the fields used in the join? something like CREATE INDEX ix_t2_id on t2 (t2_id, t2_name);

Instead of the joins you could do something like

SELECT t1_id, 
    (select t2_name from t2 where t1_id = t2_id), 
    (select t3_name from t3 where t1_id = t3_id), 
    (select t4_name from t4 where t1_id = t4_id), 
    (select t5_name from t5 where t1_id = t5_id), 
    (select t6_name from t6 where t1_id = t6_id), 
    (select t7_name from t7 where t1_id = t7_id), 
    (select t8_name from t8 where t1_id = t8_id), 
    (select t9_name from t9 where t1_id = t9_id) 
FROM t1 

But, with a good query planner, that shouldn't differ from the joins.


How much data are we talking about ? It might be you have a lot of data and as the where clause is being run at the end of the query process you are joining huge volumes of data before filtering it.

In that case its better to filter the data as soon as possible so if you can restrict the data from T1 in the first inner select all the other joins will join to a more limited set of data.

Select <your fields> from
(
Select * from t1 where t1_id = t1_value
) t1

Inner join t2
on t1.ID = t2.ID
...

if its not masses of data; check your indexes are correct then check server type things; index fragmentation; disk queues etc.


If you need all the rows of t1, and you left join on the primary key (I guess it's also the clustered index) of the other tables, there is no way to improve the speed of the query.

To improve performance you either need to reduce the result set or perform a nasty trick (eg make a denormalized copy of the data).


From your query plan I can conclude that the tables referred to as s, n and q do not have an index on the field they are being joined on.

Since there are lot of rows in these tables (about 400,000 rows in their cartesian product) and MySQL's only way to do JOIN's is using NESTED LOOPS, it will really take forever.

Create an index on these tables or define the joined field as a PRIMARY KEY.