Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JOIN queries vs multiple queries

People also ask

Are joins faster than separate queries?

Yes, one query using JOINS would be quicker. Although without knowing the relationships of the tables you are querying, the size of your dataset, or where the primary keys are, it's almost impossible to say how much faster.

Is join more efficient than subquery?

Advantages Of Joins:The advantage of a join includes that it executes faster. The retrieval time of the query using joins almost always will be faster than that of a subquery. By using joins, you can maximize the calculation burden on the database i.e., instead of multiple queries using one join query.

Which is better join or nested query?

For a join, we would be required to fetch the whole table from each site and create a large table from which the filtering will occur, hence more time will be required. So for distributed databases, nested queries are better.

Is join more efficient than WHERE?

“Is there a performance difference between putting the JOIN conditions in the ON clause or the WHERE clause in MySQL?” No, there's no difference. The following queries are algebraically equivalent inside MySQL and will have the same execution plan.


For inner joins, a single query makes sense, since you only get matching rows. For left joins, multiple queries is much better... look at the following benchmark I did:

  1. Single query with 5 Joins

    query: 8.074508 seconds

    result size: 2268000

  2. 5 queries in a row

    combined query time: 0.00262 seconds

    result size: 165 (6 + 50 + 7 + 12 + 90)

.

Note that we get the same results in both cases (6 x 50 x 7 x 12 x 90 = 2268000)

left joins use exponentially more memory with redundant data.

The memory limit might not be as bad if you only do a join of two tables, but generally three or more and it becomes worth different queries.

As a side note, my MySQL server is right beside my application server... so connection time is negligible. If your connection time is in the seconds, then maybe there is a benefit

Frank


This is way too vague to give you an answer relevant to your specific case. It depends on a lot of things. Jeff Atwood (founder of this site) actually wrote about this. For the most part, though, if you have the right indexes and you properly do your JOINs it is usually going to be faster to do 1 trip than several.


This question is old, but is missing some benchmarks. I benchmarked JOIN against its 2 competitors:

  • N+1 queries
  • 2 queries, the second one using a WHERE IN(...) or equivalent

The result is clear: on MySQL, JOIN is much faster. N+1 queries can drop the performance of an application drastically:

JOIN vs WHERE IN vs N+1

That is, unless you select a lot of records that point to a very small number of distinct, foreign records. Here is a benchmark for the extreme case:

JOIN vs N+1 - all records pointing to the same foreign record

This is very unlikely to happen in a typical application, unless you're joining a -to-many relationship, in which case the foreign key is on the other table, and you're duplicating the main table data many times.

Takeaway:

  • For *-to-one relationships, always use JOIN
  • For *-to-many relationships, a second query might be faster

See my article on Medium for more information.


I actually came to this question looking for an answer myself, and after reading the given answers I can only agree that the best way to compare DB queries performance is to get real-world numbers because there are just to many variables to be taken into account BUT, I also think that comparing the numbers between them leads to no good in almost all cases. What I mean is that the numbers should always be compared with an acceptable number and definitely not compared with each other.

I can understand if one way of querying takes say 0.02 seconds and the other one takes 20 seconds, that's an enormous difference. But what if one way of querying takes 0.0000000002 seconds, and the other one takes 0.0000002 seconds ? In both cases one way is a whopping 1000 times faster than the other one, but is it really still "whopping" in the second case ?

Bottom line as I personally see it: if it performs well, go for the easy solution.


Did a quick test selecting one row from a 50,000 row table and joining with one row from a 100,000 row table. Basically looked like:

$id = mt_rand(1, 50000);
$row = $db->fetchOne("SELECT * FROM table1 WHERE id = " . $id);
$row = $db->fetchOne("SELECT * FROM table2 WHERE other_id = " . $row['other_id']);

vs

$id = mt_rand(1, 50000);
$db->fetchOne("SELECT table1.*, table2.*
    FROM table1
    LEFT JOIN table1.other_id = table2.other_id
    WHERE table1.id = " . $id);

The two select method took 3.7 seconds for 50,000 reads whereas the JOIN took 2.0 seconds on my at-home slow computer. INNER JOIN and LEFT JOIN did not make a difference. Fetching multiple rows (e.g., using IN SET) yielded similar results.


The real question is: Do these records have a one-to-one relationship or a one-to-many relationship?

TLDR Answer:

If one-to-one, use a JOIN statement.

If one-to-many, use one (or many) SELECT statements with server-side code optimization.

Why and How To Use SELECT for Optimization

SELECT'ing (with multiple queries instead of joins) on large group of records based on a one-to-many relationship produces an optimal efficiency, as JOIN'ing has an exponential memory leak issue. Grab all of the data, then use a server-side scripting language to sort it out:

SELECT * FROM Address WHERE Personid IN(1,2,3);

Results:

Address.id : 1            // First person and their address
Address.Personid : 1
Address.City : "Boston"

Address.id : 2            // First person's second address
Address.Personid : 1
Address.City : "New York"

Address.id : 3            // Second person's address
Address.Personid : 2
Address.City : "Barcelona"

Here, I am getting all of the records, in one select statement. This is better than JOIN, which would be getting a small group of these records, one at a time, as a sub-component of another query. Then I parse it with server-side code that looks something like...

<?php
    foreach($addresses as $address) {
         $persons[$address['Personid']]->Address[] = $address;
    }
?>

When Not To Use JOIN for Optimization

JOIN'ing a large group of records based on a one-to-one relationship with one single record produces an optimal efficiency compared to multiple SELECT statements, one after the other, which simply get the next record type.

But JOIN is inefficient when getting records with a one-to-many relationship.

Example: The database Blogs has 3 tables of interest, Blogpost, Tag, and Comment.

SELECT * from BlogPost
LEFT JOIN Tag ON Tag.BlogPostid = BlogPost.id
LEFT JOIN Comment ON Comment.BlogPostid = BlogPost.id;

If there is 1 blogpost, 2 tags, and 2 comments, you will get results like:

Row1: tag1, comment1,
Row2: tag1, comment2,
Row3: tag2, comment1,
Row4: tag2, comment2,

Notice how each record is duplicated. Okay, so, 2 comments and 2 tags is 4 rows. What if we have 4 comments and 4 tags? You don't get 8 rows -- you get 16 rows:

Row1: tag1, comment1,
Row2: tag1, comment2,
Row3: tag1, comment3,
Row4: tag1, comment4,
Row5: tag2, comment1,
Row6: tag2, comment2,
Row7: tag2, comment3,
Row8: tag2, comment4,
Row9: tag3, comment1,
Row10: tag3, comment2,
Row11: tag3, comment3,
Row12: tag3, comment4,
Row13: tag4, comment1,
Row14: tag4, comment2,
Row15: tag4, comment3,
Row16: tag4, comment4,

Add more tables, more records, etc., and the problem will quickly inflate to hundreds of rows that are all full of mostly redundant data.

What do these duplicates cost you? Memory (in the SQL server and the code that tries to remove the duplicates) and networking resources (between SQL server and your code server).

Source: https://dev.mysql.com/doc/refman/8.0/en/nested-join-optimization.html ; https://dev.mysql.com/doc/workbench/en/wb-relationship-tools.html


Construct both separate queries and joins, then time each of them -- nothing helps more than real-world numbers.

Then even better -- add "EXPLAIN" to the beginning of each query. This will tell you how many subqueries MySQL is using to answer your request for data, and how many rows scanned for each query.