I answered a recursive CTE yesterday that exposed an issue with the way that these are implemented in SQL Server (possibly in other RDBMS, too?). Basically, when I try to use ROW_NUMBER
against the current recursive level, it runs against each row subset of the current recursive level. I would expect that this would work in true SET logic, and run against the entire current recursive level.
It appears that, from this MSDN article, the issue I have found is intended functionality:
Analytic and aggregate functions in the recursive part of the CTE are applied to the set for the current recursion level and not to the set for the CTE. Functions like ROW_NUMBER operate only on the subset of data passed to them by the current recursion level and not the entire set of data pased to the recursive part of the CTE. For more information, see J. Using analytical functions in a recursive CTE.
In my digging, I could find nowhere that explains why this was chosen to work the way it does? This is more of a procedural approach in a set based language, so this works against my SQL thought process and is quite confusing in my opinion. Does anybody know and/or can anybody explain why the recursive CTE treats analytic functions at the recursion level in a procedural fashion?
Here is the code to help visualize this:
Notice, the RowNumber
column in each one of these code outputs.
Here is the SQLFiddle for the CTE (only showing the 2nd level of the recursion)
WITH myCTE
AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Score desc) AS RowNumber, 1 AS RecurseLevel
FROM tblGroups
WHERE ParentId IS NULL
UNION ALL
SELECT tblGroups.*,
ROW_NUMBER() OVER (ORDER BY myCTE.RowNumber , tblGroups.Score desc) AS RowNumber,
RecurseLevel + 1 AS RecurseLevel
FROM tblGroups
JOIN myCTE
ON myCTE.GroupID = tblGroups.ParentID
)
SELECT *
FROM myCTE
WHERE RecurseLevel = 2;
Here is the second SQLFiddle for what I would expect the CTE to do (again only need the 2nd level to display the issue)
WITH myCTE
AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Score desc) AS RowNumber, 1 AS RecurseLevel
FROM tblGroups
WHERE ParentId IS NULL
)
SELECT tblGroups.*,
ROW_NUMBER() OVER (ORDER BY myCTE.RowNumber , tblGroups.Score desc) AS RowNumber,
RecurseLevel + 1 AS RecurseLevel
FROM tblGroups
JOIN myCTE
ON myCTE.GroupID = tblGroups.ParentID;
I always envisioned the SQL recursive CTE to run more like this while loop
DECLARE @RecursionLevel INT
SET @RecursionLevel = 0
SELECT *, ROW_NUMBER() OVER (ORDER BY Score desc) AS RowNumber, @RecursionLevel AS recurseLevel
INTO #RecursiveTable
FROM tblGroups
WHERE ParentId IS NULL
WHILE EXISTS( SELECT tblGroups.* FROM tblGroups JOIN #RecursiveTable ON #RecursiveTable.GroupID = tblGroups.ParentID WHERE recurseLevel = @RecursionLevel)
BEGIN
INSERT INTO #RecursiveTable
SELECT tblGroups.*,
ROW_NUMBER() OVER (ORDER BY #RecursiveTable.RowNumber , tblGroups.Score desc) AS RowNumber,
recurseLevel + 1 AS recurseLevel
FROM tblGroups
JOIN #RecursiveTable
ON #RecursiveTable.GroupID = tblGroups.ParentID
WHERE recurseLevel = @RecursionLevel
SET @RecursionLevel = @RecursionLevel + 1
END
SELECT * FROM #RecursiveTable ORDER BY RecurseLevel;
Recursive CTE Syntax A recursive CTE references itself. It returns the result subset, then it repeatedly (recursively) references itself, and stops when it returns all the results. FROM cte_name; Again, at the beginning of your CTE is the WITH clause.
There are two types of Common Table Expression, Non-Recursive CTE - It does not have any reference to itself in the CTE definition. Recursive CTE - When a CTE has reference in itself, then it's called recursive CTE.
A CTE can be recursive or non-recursive. A recursive CTE is a CTE that references itself. A recursive CTE can join a table to itself as many times as necessary to process hierarchical data in the table. CTEs increase modularity and simplify maintenance.
Analytic functions are special in the way that they need a known resultset to resolve. They depend on the following, preceding or full resultset to caculate current value. That said, merging view is never allowed on a view that contains an analytic function. Why? That will change the result.
Ex:
Select * from (
select row_number() over (partition by c1 order by c2) rw, c3 from t) z
where c3=123
is not the same as
select row_number() over (partition by c1 order by c2) rw, c3 from t
where c3=123
These 2 will return different values for rw. That's why sub-queries containing analytic functions will always be fully resolved before and never be merged with the rest.
Update
Looking at the 2nd query:
WITH myCTE
AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Score desc) AS RowNumber, 1 AS RecurseLevel
FROM tblGroups
WHERE ParentId IS NULL
)
SELECT tblGroups.*,
ROW_NUMBER() OVER (ORDER BY myCTE.RowNumber , tblGroups.Score desc) AS RowNumber,
RecurseLevel + 1 AS RecurseLevel
FROM tblGroups
JOIN myCTE
ON myCTE.GroupID = tblGroups.ParentID;
It works exactly as if it was written like (Same execution plan and result) :
SELECT tblGroups.*,
ROW_NUMBER() OVER (ORDER BY myCTE.RowNumber , tblGroups.Score desc) AS RowNumber,
RecurseLevel + 1 AS RecurseLevel
FROM tblGroups
JOIN (
SELECT *, ROW_NUMBER() OVER (ORDER BY Score desc) AS RowNumber, 1 AS RecurseLevel
FROM tblGroups
WHERE ParentId IS NULL
)myCTE ON myCTE.GroupID = tblGroups.ParentID;
This one needs to be partitioned to reset the rownumber.
Recursive queries don't work in a while loop, they are not procedural. At the base, they work like a recursive function, but depending on the tables, the query, the indexes, they can be optimized to run one way or the other.
If we do follow the concept that view cannot be merged when using analytic functions, and looking at query 1. It can only run once way, and it's in nested loop.
WITH myCTE
AS
( /*Cannot be merged*/
SELECT *, ROW_NUMBER() OVER (ORDER BY Score desc) AS RowNumber, 1 AS RecurseLevel,
cast(0 as bigint) n
FROM tblGroups
WHERE ParentId IS NULL
UNION ALL
/*Cannot be merged*/
SELECT tblGroups.*,
ROW_NUMBER() OVER (ORDER BY myCTE.RowNumber, tblGroups.Score desc) AS RowNumber, RecurseLevel + 1 AS RecurseLevel,
myCTE.RowNumber
FROM tblGroups
JOIN myCTE
ON myCTE.GroupID = tblGroups.ParentID
)
SELECT *
FROM myCTE;
So 1st select, cannot be merged 2nd, neither. The only way to run this query is in nested loop for each item returned in each level, hence the reset. Again, it's not a question of procedural or not, just a question of possible execution plan.
Hope this answers you question, let me if it doesn't:)
y
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With