Let's say we have two tables: 'Car' and 'Part', with a joining table in 'Car_Part'. Say I want to see all cars that have a part 123 in them. I could do this: <pre class="prettyprint"><code>SELECT Car.Col1, Car.Col2, Car.Col3 FROM Car INNER JOIN Car_Part ON Car_Part.Car_Id = Car.Car_Id WHERE Car_Part.Part_Id = @part_to_look_for GROUP BY Car.Col1, Car.Col2, Car.Col3 </code></pre> Or I could do this <pre class="prettyprint"><code>SELECT Car.Col1, Car.Col2, Car.Col3 FROM Car WHERE Car.Car_Id IN (SELECT Car_Id FROM Car_Part WHERE Part_Id = @part_to_look_for) </code></pre> Now, everything in me wants to use the first method because I've been brought up by good parents who instilled in me a puritanical hatred of sub-queries and a love of set theory, but it has been suggested to me that doing that big GROUP BY is worse than a sub-query. I should point out that we're on SQL Server 2008. I should also say that in reality I want to select based the Part Id, Part Type and possibly other things too. So, the query I want to do actually looks like this: <pre class="prettyprint"><code>SELECT Car.Col1, Car.Col2, Car.Col3 FROM Car INNER JOIN Car_Part ON Car_Part.Car_Id = Car.Car_Id INNER JOIN Part ON Part.Part_Id = Car_Part.Part_Id WHERE (@part_Id IS NULL OR Car_Part.Part_Id = @part_Id) AND (@part_type IS NULL OR Part.Part_Type = @part_type) GROUP BY Car.Col1, Car.Col2, Car.Col3 </code></pre> Or... <pre class="prettyprint"><code>SELECT Car.Col1, Car.Col2, Car.Col3 FROM Car WHERE (@part_Id IS NULL OR Car.Car_Id IN ( SELECT Car_Id FROM Car_Part WHERE Part_Id = @part_Id)) AND (@part_type IS NULL OR Car.Car_Id IN ( SELECT Car_Id FROM Car_Part INNER JOIN Part ON Part.Part_Id = Car_Part.Part_Id WHERE Part.Part_Type = @part_type)) </code></pre>

The best thing you can do is test them yourself, on realistic data volumes. That would not only benefit for this query, but for all future queries when you are not sure which is the best way. Important things to do include: - test on production level data volumes - test fairly & consistently (clear cache: http://www.adathedev.co.uk/2010/02/would-you-like-sql-cache-with-that.html) - check the execution plan You could either monitor using SQL Profiler and check the duration/reads/writes/CPU there, or <code>SET STATISTICS IO ON; SET STATISTICS TIME ON;</code> to output stats in SSMS. Then compare the stats for each query. If you can't do this type of testing, you'll be potentially exposing yourself to performance problems down the line that you'll have to then tune/rectify. There are tools out there you can use that will generate data for you.

Which is faster: JOIN with GROUP BY or a Subquery?

Tags:

join

sql-server

sql-server-2008

group-by

subquery

Let's say we have two tables: 'Car' and 'Part', with a joining table in 'Car_Part'. Say I want to see all cars that have a part 123 in them. I could do this:

SELECT Car.Col1, Car.Col2, Car.Col3 
FROM Car
INNER JOIN Car_Part ON Car_Part.Car_Id = Car.Car_Id
WHERE Car_Part.Part_Id = @part_to_look_for
GROUP BY Car.Col1, Car.Col2, Car.Col3

Or I could do this

SELECT Car.Col1, Car.Col2, Car.Col3 
FROM Car
WHERE Car.Car_Id IN (SELECT Car_Id FROM Car_Part WHERE Part_Id = @part_to_look_for)

Now, everything in me wants to use the first method because I've been brought up by good parents who instilled in me a puritanical hatred of sub-queries and a love of set theory, but it has been suggested to me that doing that big GROUP BY is worse than a sub-query.

I should point out that we're on SQL Server 2008. I should also say that in reality I want to select based the Part Id, Part Type and possibly other things too. So, the query I want to do actually looks like this:

SELECT Car.Col1, Car.Col2, Car.Col3 
FROM Car
INNER JOIN Car_Part ON Car_Part.Car_Id = Car.Car_Id
INNER JOIN Part ON Part.Part_Id = Car_Part.Part_Id
WHERE (@part_Id IS NULL OR Car_Part.Part_Id = @part_Id)
AND (@part_type IS NULL OR Part.Part_Type = @part_type)
GROUP BY Car.Col1, Car.Col2, Car.Col3

Or...

SELECT Car.Col1, Car.Col2, Car.Col3 
FROM Car
WHERE (@part_Id IS NULL OR Car.Car_Id IN (
    SELECT Car_Id 
    FROM Car_Part 
    WHERE Part_Id = @part_Id))
AND (@part_type IS NULL OR Car.Car_Id IN (
    SELECT Car_Id
    FROM Car_Part
    INNER JOIN Part ON Part.Part_Id = Car_Part.Part_Id
    WHERE Part.Part_Type = @part_type))

237

asked Jul 01 '10 08:07

d4nt

1 Answers

The best thing you can do is test them yourself, on realistic data volumes. That would not only benefit for this query, but for all future queries when you are not sure which is the best way.

Important things to do include:
- test on production level data volumes
- test fairly & consistently (clear cache: http://www.adathedev.co.uk/2010/02/would-you-like-sql-cache-with-that.html)
- check the execution plan

You could either monitor using SQL Profiler and check the duration/reads/writes/CPU there, or SET STATISTICS IO ON; SET STATISTICS TIME ON; to output stats in SSMS. Then compare the stats for each query.

If you can't do this type of testing, you'll be potentially exposing yourself to performance problems down the line that you'll have to then tune/rectify. There are tools out there you can use that will generate data for you.

122

answered Sep 21 '22 20:09

AdaTheDev

Related questions
                            
                                Watch for a table new records in sql database
                            
                                Does INNER JOIN performance depends on order of tables?
                            
                                SQL Server Unique Index across tables
                            
                                How do you use T-SQL Full-Text Search to get results like Google?
                            
                                How to fix TF246017 The Team foundation server could not connect to database
                            
                                Compare two rows and identify columns whose values are different
                            
                                BCP Utility Unable to Export Data in Linux Using JAVA:
                            
                                Unable to complete operation. The supplied SqlConnection does not specify an initial catalog or AttachDBFileName
                            
                                MS Sync Framework and SQL Server Compact
                            
                                Best-practices for localizing a SQL Server (2005/2008) database
                            
                                SQL Server: ORDER BY in subquery with UNION
                            
                                Sql Server JDBC Connection Reset Error : Only on Amazon EC2
                            
                                How to set collation for a connection in SQL Server?
                            
                                How can I structure a query to give me only the rows that match ALL values in a CSV list of IDs in T-SQL
                            
                                Can you use SQL to SELECT values from a JSON array?
                            
                                SQL Server 2012 intellisense
                            
                                Entity Framework - Call stored procedure with default parameters
                            
                                Can't create an index catalog in localdb v\11.0
                            
                                How to join unknown number of lists in LINQ
                            
                                How to execute a stored procedure against linked server?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With