Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is better Multiple Queries instead of Multiple Joins?

There are a lot of similar questions of the type "Multiple Queries vs Single Query" in SO.
But I didn't see any with a general conclusion, therefore I'm still confused about this.

So, I will ask it in other terms:

When is better to run Multiple Queries instead of a Single Query with Multiple Joins?

I'm not asking for the trivial cases, obviously joining two, or 3 tables is much faster than executing 3 queries.

I'm thinking in cases for example where you have 10+ joins, and some of those joins are many to many relations, so your final query has GROUP_CONCAT, a mix of LEFT and INNER joins, etc.

For example, you want the product name, but also all their images, and also all their tags, and also all their videos, and also all the directions where you can buy it.
Is better to make a very long query with complex joins and group_concat (which is many times really difficult to manage if you can't use distinct), or executing a query for the product details, a query for the images, another one for the tags, etc. ?

I can write a particular example if it helps to clarify the question. But I was hoping a general rule for this situations.
Where is the limit? when a single query with Joins is worst than multiple queries?

and also, in those cases when is better to run multiple SELECT queries:
is faster to run them inside a transaction (autocommit = false) ?
is faster to merge those multiple selects inside a single query with multiple subselects?

Thanks !

like image 474
Enrique Avatar asked Dec 07 '11 14:12

Enrique


4 Answers

Where is the limit? when a single query with Joins is worst than multiple queries?

I dont think it is easy to draw a limit, it depends a lot on your scenario and situations. There might be multiple factors like indexes, partitioning, joining columns, number of rows, structure of query e.t.c.

multiple joins, eg joining 5 columns, where joining columns are keys, values are not same for most of rows (eg gender) and have proper indexes might be faster then the query which joins only two tables without proper indexes.

I guess One might set limits for oneself, eg you can decide that this particular use case (eg insert, or selecting) must not take more than 1 second, if it is taking more than that, more optimization might be required.

like image 117
Zohaib Avatar answered Nov 20 '22 00:11

Zohaib


"It depends" is honestly the only valid answer. There is and can be no hard-and-fast "if greater than X joins then break it up" rule. (If there were, then X would have to change every few years. Stuff I write today would probably bog down the average server 10 years ago.)

With that said, the best tool for deteriming that cutoff point is experience. The more you write, test, and experiment with code, CROSS JOIN the more familiar you are with the hardware and data sets you have to work with "now", the better you will be able to write optimal queries. This is absolutely not to say that only gurus who sneer at the extensions of the SQL-92 standards can write optimmal queries. With reasonable effort new programmers can produce code that is "Good Enough" and, as the name says, that generally is good enough for most tasks.

like image 26
Philip Kelley Avatar answered Nov 20 '22 00:11

Philip Kelley


Where is the limit? when a single query with Joins is worst than multiple queries?

That would depend on the optimiser. As the query grows more complex, the risk of the optimiser selecting a poor execution plan increases.

Just selecting the order in which to process the tables can be done in N! ways, where N is the number of tables queried. With 5 tables there are 120 ways, with 10 tables a whopping 3628800. And that is just for one of the decisions that the optimiser must make.

like image 1
Klas Lindbäck Avatar answered Nov 20 '22 00:11

Klas Lindbäck


I would say you would join rather than run separate selects when you need the related data all at once OR if the related data is really big (e.g. LOBS with images...).

If you don't need the large related data all at once, then think "lazy initialization", where you query that large data when asked for.

like image 1
Roger Cornejo Avatar answered Nov 19 '22 23:11

Roger Cornejo