Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the advantages of a query using a derived table(s) over a query not using them?

I know how derived tables are used, but I still can’t really see any real advantages of using them.

For example, in the following article http://techahead.wordpress.com/2007/10/01/sql-derived-tables/ the author tried to show benefits of a query using derived table over a query without one with an example, where we want to generate a report that shows off the total number of orders each customer placed in 1996, and we want this result set to include all customers, including those that didn’t place any orders that year and those that have never placed any orders at all( he’s using Northwind database ).

But when I compare the two queries, I fail to see any advantages of a query using a derived table ( if nothing else, use of a derived table doesn't appear to simplify our code, at least not in this example):

Regular query:

SELECT C.CustomerID, C.CompanyName, COUNT(O.OrderID) AS TotalOrders
FROM Customers C LEFT OUTER JOIN Orders O ON
       C.CustomerID = O.CustomerID AND YEAR(O.OrderDate) = 1996
GROUP BY C.CustomerID, C.CompanyName

Query using a derived table:

SELECT C.CustomerID, C.CompanyName, COUNT(dOrders.OrderID) AS TotalOrders
FROM Customers C LEFT OUTER JOIN
        (SELECT * FROM Orders WHERE YEAR(Orders.OrderDate) = 1996) AS dOrders
     ON
        C.CustomerID = dOrders.CustomerID
GROUP BY C.CustomerID, C.CompanyName

Perhaps this just wasn’t a good example, so could you show me an example where benefits of derived table are more obvious?

thanx

REPLY TO GBN:

In this case, you couldn't capture both products and order aggregates if there is no relation between Customers and Products.

Could you elaborate what exactly you mean? Wouldn’t the following query produce the same result set as your query:

SELECT 
     C.CustomerID, C.CompanyName,
     COUNT(O.OrderID) AS TotalOrders,
     COUNT(DISTINCT P.ProductID) AS DifferentProducts 
FROM Customers C LEFT OUTER JOIN Orders O ON
       C.CustomerID = O.CustomerID AND YEAR(O.OrderDate) = 1996
   LEFT OUTER JOIN Products P ON 
       O.somethingID = P.somethingID  
GROUP BY C.CustomerID, C.CompanyName

REPLY TO CADE ROUX:

In addition, if expressions are used to derive columns from derived columns with a lot of shared intermediate calculations, a set of nested derived tables or stacked CTEs is the only way to do it:

SELECT x, y, z1, z2
FROM (
    SELECT *
           ,x + y AS z1
           ,x - y AS z2
    FROM (
        SELECT x * 2 AS y
        FROM A
    ) AS A
) AS A

Wouldn't the following query produce the same result as your above query:

SELECT x, x * 2 AS y, x + x*2 AS z1, x - x*2 AS z2
FROM A
like image 603
AspOnMyNet Avatar asked Dec 07 '22 03:12

AspOnMyNet


1 Answers

In your examples, the derived table is not strictly necessary. There are numerous cases where you might need to join to an aggregate or similar, and a derived table is really the only way to handle that:

SELECT *
FROM A
LEFT JOIN (
    SELECT x, SUM(y)
    FROM B
    GROUP BY x
) AS B
    ON B.x = A.x

In addition, if expressions are used to derive columns from derived columns with a lot of shared intermediate calculations, a set of nested derived tables or stacked CTEs is the only way to do it:

SELECT x, y, z1, z2
FROM (
    SELECT *
           ,x + y AS z1
           ,x - y AS z2
    FROM (
        SELECT x * 2 AS y
        FROM A
    ) AS A
) AS A

As far as maintainability, using stacked CTEs or derived tables (they are basically equivalent) and can make for more readable and maintainable code, as well as facilitating cut-and-paste re-use and refactoring. The optimizer can typically flatten then very easily.

I typically use stacked CTEs instead of nesting for a little better readability (same two examples):

WITH B AS (
    SELECT x, SUM(y)
    FROM B
    GROUP BY x
)
SELECT *
FROM A
LEFT JOIN B
    ON B.x = A.x

WITH A1 AS (
    SELECT x * 2 AS y
    FROM A
)
,A2 AS (
    SELECT *
           ,x + y AS z1
           ,x - y AS z2
    FROM A1
)
SELECT x, y, z1, z2
FROM A2

Regarding your question about:

SELECT x, x * 2 AS y, x + x*2 AS z1, x - x*2 AS z2 
FROM A 

This has the x * 2 code repeated 3 times. If this business rule needs to change, it will have to change in 3 places - a recipe for injection of defects. This gets compounded any time you have intermediate calculations which need to be consistent and defined in only one place.

This would not be as much of a problem if SQL Server's scalar user-defined functions could be inlined (or if they performed acceptably), you could simply build your UDFs to stack your results and the optimizer would elimnate redundant calls. Unfortunately SQL Server's scalar UDF implementation cannot handle that well for large sets of rows.

like image 67
Cade Roux Avatar answered Jan 07 '23 17:01

Cade Roux