Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transact-SQL - sub query or left-join?

Tags:

sql

tsql

I have two tables containing Tasks and Notes, and want to retrieve a list of tasks with the number of associated notes for each one. These two queries do the job:

select t.TaskId,
       (select count(n.TaskNoteId) from TaskNote n where n.TaskId = t.TaskId) 'Notes'
from   Task t

-- or
select t.TaskId,
       count(n.TaskNoteId) 'Notes'
from   Task t
left join
       TaskNote n
on     t.TaskId = n.TaskId
group by t.TaskId

Is there a difference between them and should I be using one over the other, or are they just two ways of doing the same job? Thanks.

like image 329
Nick Avatar asked Sep 22 '08 22:09

Nick


4 Answers

On small datasets they are wash when it comes to performance. When indexed, the LOJ is a little better.

I've found on large datasets that an inner join (an inner join will work too.) will outperform the subquery by a very large factor (sorry, no numbers).

like image 122
Austin Salonen Avatar answered Nov 01 '22 04:11

Austin Salonen


In most cases, the optimizer will treat them the same.

I tend to prefer the second, because it has less nesting, which makes it easier to read and easier to maintain. I have started to use SQL Server's common table expressions to reduce nesting as well for the same reason.

In addition, the second syntax is more flexible if there are further aggregates which may be added in the future in addition to COUNT, like MIN(some_scalar), MAX(), AVG() etc.

like image 6
Cade Roux Avatar answered Nov 01 '22 02:11

Cade Roux


The subquery will be slower as it is being executed for every row in the outer query. The join will be faster as it is done once. I believe that the query optimiser will not rewrite this query plan as it can't recognize the equivalence.

Normally you would do a join and group by for this sort of count. Correlated subqueries of the sort you show are mainly of interest if they have to do some grouping or more complex predicate on a table that is not participating in another join.

like image 5
ConcernedOfTunbridgeWells Avatar answered Nov 01 '22 04:11

ConcernedOfTunbridgeWells


If you're using SQL Server Management Studio, you can enter both versions into the Query Editor and then right-click and choose Display Estimated Execution Plan. It will give you two percentage costs relative to the batch. If they're expected to take the same time, they'll both show as 50% - in which case, choose whichever you prefer for other reasons (easier to read, easier to maintain, better fit with your coding standards etc). Otherwise, you can pick the one with the lower percentage cost relative to the batch.

You can use the same technique to look at changing any query to improve performance by comparing two versions that do the same thing.

Of course, because it's a cost relative to the batch, it doesn't mean that either query is as fast as it could be - it just tells you how they compare to each other, not to some notional optimum query to get the same results.

like image 2
Simon Forrest Avatar answered Nov 01 '22 04:11

Simon Forrest