Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inner Join with derived table using sub query

Environment: SQL 2008 R2

I created a derived table using sub query and joined with main table. I just like to know if subquery is executed only once or will it be executed for each row in result set. Consider following example (fictional table names for reference only)

SELECT E.EID,DT.Salary FROM Employees E
INNER JOIN
(
    SELECT EID, (SR.Rate * AD.DaysAttended) Salary
    FROM SalaryRate SR
    INNER JOIN AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID

So, the subquery used for Inner Join will be executed only once or multiple times??

If I re-write above query using OUTER APPLY, I know for sure the subquery will be executed for each row. See Below.

SELECT E.EID,DT.Salary FROM Employees E
OUTER APPLY
(
    SELECT (SR.Rate * AD.DaysAttended) Salary
    FROM SalaryRate SR
    INNER JOIN AttendanceDetails AD on AD.EID=SR.EID
    WHERE SR.EID=E.EID
) DT --Derived Table for outer apply

So just want to ensure that Inner Join will execute sub query only once.

like image 282
love kumar Avatar asked Oct 21 '14 12:10

love kumar


2 Answers

The first thing to note is that your queries are not comparable, OUTER APPLY needs to be replaced with CROSS APPLY, or INNER JOIN with LEFT JOIN.

When they are made comparable though, you can see that the query plans for both queries are identical. I have just mocked up a sample DDL:

CREATE TABLE #Employees (EID INT NOT NULL);
INSERT #Employees VALUES (0);
CREATE TABLE #SalaryRate (EID INT NOT NULL, Rate MONEY NOT NULL);
CREATE TABLE #AttendanceDetails (EID INT NOT NULL, DaysAttended INT NOT NULL);

Running the following:

SELECT E.EID,DT.Salary FROM #Employees E
OUTER APPLY
(
    SELECT (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
    WHERE SR.EID=E.EID
) DT; --Derived Table for outer apply

SELECT E.EID,DT.Salary FROM #Employees E
LEFT JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID;

Gives the following plan:

enter image description here

And changing to INNER/CROSS:

SELECT E.EID,DT.Salary FROM #Employees E
CROSS APPLY
(
    SELECT (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
    WHERE SR.EID=E.EID
) DT; --Derived Table for outer apply


SELECT E.EID,DT.Salary FROM #Employees E
INNER JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID;

Gives the following plan:

enter image description here

These are the plans where there is no data in the outer tables, and only one row in employees, so not really realistic. In the case of the outer apply, SQL Server is able to determine that there is only one row in employees, so it would be beneficial to just do a nested loop join (i.e. row by row lookup) to the outer tables. After putting 1,000 rows in employees, using LEFT JOIN/OUTER APPLY yields the following plan:

enter image description here

You can see here that the join is now a hash match join, which means (in it's simplest terms) that SQL Server has determined that the best plan is to execute the outer query first, hash the results and then lookup from employees. This however does not mean that the subquery as a whole is executed and the results stored, for simplicity purposes you could consider this, but predicates from the outer query can still be still be used, for example, if the subquery were executed and stored internally, the following query would present massive overhead:

SELECT E.EID,DT.Salary FROM #Employees E
LEFT JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID
WHERE E.EID = 1;

What whould be the point in retrieving all employee rates, storing the results, only to actually look up one employee? Inspection of the execution plan shows that the EID = 1 predicate is passed to the table scan on #AttendanceDetails:

enter image description here

So the answer to the following points is:

  • If I re-write above query using OUTER APPLY, I know for sure the subquery will be executed for each row.
  • Inner Join will execute sub query only once.

It depends. Using APPLY SQL Server will attempt to rewrite the query as a JOIN if possible, as this will yield the optimal plan, so using OUTER APPLY does not guarantee that the query will be executed once for each row. Similarly using LEFT JOIN does not guarantee that the query is executed only once.

SQL is a declarative language, in that you tell it what you want it to do, not how to do it, so you shouldn't rely on specific commands to elicit specific behaviour, instead, if you find performance issues, check the execution plan, and IO statistics to find out how it is doing it, and identify how you can improve your query.

Further more, SQL Server does not matierialise subqueries, usually the definition is expanded out into the main query, so even though you have written:

SELECT E.EID,DT.Salary FROM #Employees E
INNER JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID;

What is actually executed is more like:

SELECT  e.EID, sr.Rate * ad.DaysAttended AS Salary
FROM    #Employees e
        INNER JOIN #SalaryRate sr
            on e.EID = sr.EID
        INNER JOIN #AttendanceDetails ad
            ON ad.EID = sr.EID;
like image 90
GarethD Avatar answered Oct 23 '22 08:10

GarethD


With INNER JOIN your Sub-Query will be execute only once and its records may gets stored internally in tempdb worktable on complex operations, then JOINed with the 1st table.

With APPLY clause, the Sub-Query will be executed for every row in the 1st table.

edit: use CTE

;with SalaryRateCTE as 
(
    SELECT EID, (SR.Rate * AD.DaysAttended) AS Salary
    FROM SalaryRate SR
    INNER JOIN AttendanceDetails AD on AD.EID=SR.EID
)
SELECT E.EID, DT.Salary 
FROM Employees E
INNER JOIN SalaryRateCTE DT --Derived Table for inner join
ON DT.EID = E.EID
like image 23
Manoj Pandey Avatar answered Oct 23 '22 08:10

Manoj Pandey