Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

filter first or join first?

I originally just wrote a query to find out annual total order number per customer larger than 1. In 1.query, I filtered the result set and join it with another result set which found out customer name. Curiously, I guess filter first would produce better performance since less results needed to join. So I wrote second query to join first and then filter, which looks neater than first query. Result is the same as I expect since all time in the result is lower. But I am not sure which time is most important? Or this case is just a coincident? How to think about performance?

use [AdventureWorks2012]
set statistics time on;

--1.filter first,join second
select tempC.*,tempP.FirstName,tempP.LastName
from 
(select  Year(OrderDate) As OrderYear,CustomerID,count(CustomerID) As CustomerOrderAmt
from Sales.SalesOrderHeader 
group by Year(OrderDate),CustomerID 
having count(CustomerID) >1
) as tempC
join(
select p.FirstName,p.LastName,c.CustomerID
from Person.Person as p join Sales.Customer as c on c.PersonID=p.BusinessEntityID
) as tempP
on tempC.CustomerID=tempP.CustomerID
order by tempC.OrderYear,tempC.CustomerID
GO

--2.join first,filter second

select Year(so.OrderDate) As Orderdate,so.CustomerID,count(so.CustomerID) As CustomerOrderAmt,p.FirstName,p.LastName
from Sales.SalesOrderHeader as so
join Sales.Customer as C on so.CustomerID=c.CustomerID
join Person.Person as p on c.PersonID=p.BusinessEntityID
group by Year(so.OrderDate),so.CustomerID,p.FirstName,p.LastName
having count(so.CustomerID)>1
go
like image 597
FebWind Avatar asked Jun 21 '13 01:06

FebWind


People also ask

Does WHERE filter before or after join?

Normally, filtering is processed in the WHERE clause once the two tables have already been joined. It's possible, though that you might want to filter one or both of the tables before joining them. For example, you only want to create matches between the tables under certain circumstances.

What comes first WHERE or join?

Save this answer. Show activity on this post. The rows selected by a query are filtered first by the FROM clause join conditions, then the WHERE clause search conditions, and then the HAVING clause search conditions. Inner joins can be specified in either the FROM or WHERE clause without affecting the final result.

What's better from a performance point of view filtering data first and then joining it with other sources or joining it first and then filtering?

What's better from a performance point of view: Filtering data first and then joining it with other sources, or joining it first and then filtering? It's better to filter data first and then join it with other sources.

What is the order of operations in SQL?

Six Operations to Order: SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY.


2 Answers

The query optimizer can choose to perform actions in any order that produces the same logical result, so even if you try to filter first and then join, unless you force it by using a table variable or temp table, the optimizer may join then filter.

If you really believe the optimizer is doing something stupid, you can try things like the table var or temp table, but what seems stupid may not actually be, for reasons that get pretty advanced.

That said, sometimes the way you write the query will affect what the optimizer does, so you should generally look at the execution plans. If they are the same, go with the clearest code. If they aren't test and test again, and go with what seems best.

like image 102
Nathan Henkel Avatar answered Sep 19 '22 19:09

Nathan Henkel


I consider as a good practice to use sub-queries allowing to reduce the overall number of joining operations and the amount of the columns in the block GROUP BY. Therefore, I will tell you at once that the first query is definitely more efficient.

Queries:

SELECT  
      t.OrderYear
    , t.CustomerID
    , t.CustomerOrderAmt 
    , p.FirstName 
    , p.LastName
FROM ( 
     SELECT 
            OrderYear = YEAR(OrderDate)  
          , CustomerID 
          , CustomerOrderAmt = COUNT(CustomerID) 
     FROM Sales.SalesOrderHeader
     GROUP BY 
            YEAR(OrderDate) 
          , CustomerID
     HAVING COUNT(CustomerID) > 1
) t
JOIN ( 
     SELECT   
            p.FirstName 
          , p.LastName 
          , c.CustomerID
     FROM Person.Person p
     JOIN Sales.Customer c ON c.PersonID = p.BusinessEntityID
) p ON t.CustomerID = p.CustomerID
ORDER BY 
       t.OrderYear 
     , t.CustomerID

vs

SELECT  
       Orderdate = YEAR(so.OrderDate)  
     , so.CustomerID 
     , CustomerOrderAmt = COUNT(so.CustomerID)  
     , FirstName = MAX(p.FirstName)
     , LastName = MAX(p.LastName)
FROM Sales.SalesOrderHeader so
JOIN Sales.Customer c ON so.CustomerID = c.CustomerID
JOIN Person.Person p ON c.PersonID = p.BusinessEntityID
GROUP BY 
       YEAR(so.OrderDate) 
     , so.CustomerID 
HAVING COUNT(so.CustomerID) > 1

Query cost:

Query cost

Execution time:

-- first query
SQL Server Execution Times:
   CPU time = 94 ms,  elapsed time = 395 ms.

-- second query   
SQL Server Execution Times:
   CPU time = 140 ms,  elapsed time = 480 ms.
like image 38
Devart Avatar answered Sep 20 '22 19:09

Devart