Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is faster: JOIN with GROUP BY or a Subquery?

Let's say we have two tables: 'Car' and 'Part', with a joining table in 'Car_Part'. Say I want to see all cars that have a part 123 in them. I could do this:

SELECT Car.Col1, Car.Col2, Car.Col3 
FROM Car
INNER JOIN Car_Part ON Car_Part.Car_Id = Car.Car_Id
WHERE Car_Part.Part_Id = @part_to_look_for
GROUP BY Car.Col1, Car.Col2, Car.Col3

Or I could do this

SELECT Car.Col1, Car.Col2, Car.Col3 
FROM Car
WHERE Car.Car_Id IN (SELECT Car_Id FROM Car_Part WHERE Part_Id = @part_to_look_for)

Now, everything in me wants to use the first method because I've been brought up by good parents who instilled in me a puritanical hatred of sub-queries and a love of set theory, but it has been suggested to me that doing that big GROUP BY is worse than a sub-query.

I should point out that we're on SQL Server 2008. I should also say that in reality I want to select based the Part Id, Part Type and possibly other things too. So, the query I want to do actually looks like this:

SELECT Car.Col1, Car.Col2, Car.Col3 
FROM Car
INNER JOIN Car_Part ON Car_Part.Car_Id = Car.Car_Id
INNER JOIN Part ON Part.Part_Id = Car_Part.Part_Id
WHERE (@part_Id IS NULL OR Car_Part.Part_Id = @part_Id)
AND (@part_type IS NULL OR Part.Part_Type = @part_type)
GROUP BY Car.Col1, Car.Col2, Car.Col3

Or...

SELECT Car.Col1, Car.Col2, Car.Col3 
FROM Car
WHERE (@part_Id IS NULL OR Car.Car_Id IN (
    SELECT Car_Id 
    FROM Car_Part 
    WHERE Part_Id = @part_Id))
AND (@part_type IS NULL OR Car.Car_Id IN (
    SELECT Car_Id
    FROM Car_Part
    INNER JOIN Part ON Part.Part_Id = Car_Part.Part_Id
    WHERE Part.Part_Type = @part_type))
like image 237
d4nt Avatar asked Jul 01 '10 08:07

d4nt


People also ask

Which is faster subquery or joins?

Advantages Of Joins: The retrieval time of the query using joins almost always will be faster than that of a subquery. By using joins, you can maximize the calculation burden on the database i.e., instead of multiple queries using one join query.

Which is faster group by or join?

Even more, even all the previous are set still, it depends on your data (and its distribution) and the specific parameters of the query. For example, if the 7 books are the 7 books of Harry Potter and all your users are Harry Potter's fans, then the GROUP BY/HAVING may be faster.

Which type of join is fastest?

If you dont include the items of the left joined table, in the select statement, the left join will be faster than the same query with inner join. If you do include the left joined table in the select statement, the inner join with the same query was equal or faster than the left join.

Should you use a join or a subquery?

If you need to combine related information from different rows within a table, then you can join the table with itself. Use subqueries when the result that you want requires more than one query and each subquery provides a subset of the table involved in the query.


1 Answers

The best thing you can do is test them yourself, on realistic data volumes. That would not only benefit for this query, but for all future queries when you are not sure which is the best way.

Important things to do include:
- test on production level data volumes
- test fairly & consistently (clear cache: http://www.adathedev.co.uk/2010/02/would-you-like-sql-cache-with-that.html)
- check the execution plan

You could either monitor using SQL Profiler and check the duration/reads/writes/CPU there, or SET STATISTICS IO ON; SET STATISTICS TIME ON; to output stats in SSMS. Then compare the stats for each query.

If you can't do this type of testing, you'll be potentially exposing yourself to performance problems down the line that you'll have to then tune/rectify. There are tools out there you can use that will generate data for you.

like image 122
AdaTheDev Avatar answered Sep 21 '22 20:09

AdaTheDev