I get that an anchor is necessary, that makes sense. And I know that a <code>UNION ALL</code> is needed, if your recursive CTE doesn't have one, it just doesn't work... but I can't find a good explanation of why that is the case. All the documentation just states that you need it. Why can't we use a <code>UNION</code> instead of a <code>UNION ALL</code> in a recursive query? It seems like it would be a good idea to not include duplicates upon deeper recursion, doesn't it? Something like that should already be working under the hood already, I would think.

I presume the reason is that they just haven't considered this a priority feature worth implementing. It looks like Postgres does support both <code>UNION</code> and <code>UNION ALL</code>. If you have a strong case for this feature you can provide feedback at Connect (or whatever the URL of its replacement will be). Preventing duplicates being added could be useful as a duplicate row added in a later step to a previous one will nearly always end up causing an infinite loop or exceeding the max recursion limit. There are quite a few places in the SQL Standards where code is used demonstrating <code>UNION</code> such as below <img src="https://i.stack.imgur.com/2KAeM.png" alt="enter image description here"> This article explains how they are implemented in SQL Server. They aren't doing anything like that "under the hood". The stack spool deletes rows as it goes so it wouldn't be possible to know if a later row is a duplicate of a deleted one. Supporting <code>UNION</code> would need a somewhat different approach. In the meantime you can quite easily achieve the same in a multi statement TVF. To take a silly example below (Postgres Fiddle) <pre class="prettyprint"><code>WITH R AS (SELECT 0 AS N UNION SELECT ( N + 1 )%10 FROM R) SELECT N FROM R </code></pre> Changing the <code>UNION</code> to <code>UNION ALL</code> and adding a <code>DISTINCT</code> at the end won't save you from the infinite recursion. But you can implement this as <pre class="prettyprint"><code>CREATE FUNCTION dbo.F () RETURNS @R TABLE(n INT PRIMARY KEY WITH (IGNORE_DUP_KEY = ON)) AS BEGIN INSERT INTO @R VALUES (0); --anchor WHILE @@ROWCOUNT > 0 BEGIN INSERT INTO @R SELECT ( N + 1 )%10 FROM @R END RETURN END GO SELECT * FROM dbo.F () </code></pre> The above uses <code>IGNORE_DUP_KEY</code> to discard duplicates. If the column list is too wide to be indexed you would need <code>DISTINCT</code> and <code>NOT EXISTS</code> instead. You'd also probably want a parameter to set the max number of recursions and avoid infinite loops.

This is pure speculation, but I would say, that the UNION ALL ensures, that the result of each iteration can be calculated individually. Essentially it ensures, that an iteration cannot interfere with another. A UNION would require a sort operation in the background which might modify the result of previous iterations. The program should not change the state of a previous call in the call stack, it should interact with it using input parameters and the result of the subsequent iteration (in a procedural setting). This probably should apply to set based operations, thus to SQL Server's recursive CTEs. I might be wrong, late night brain-dumps are not 100% reliable :) Edit (just another thought): When a recursion starts, you have a call stack. Each level in this stack starts calculating it's result, but should wait for the result of all subsequent calls before it can finish and return it's result. UNION would try to eliminate duplication, but you don't have any records until you reach the termination condition (and the final would be built from the bottom to the top), but the result of the subsequent call is required by the ones above it. The UNION would be reduced to a DISTINCT at the very end.

Why does a Recursive CTE in Transact-SQL require a UNION ALL and not a UNION?

Tags:

sql-server

tsql

recursion

common-table-expression

I get that an anchor is necessary, that makes sense. And I know that a UNION ALL is needed, if your recursive CTE doesn't have one, it just doesn't work... but I can't find a good explanation of why that is the case. All the documentation just states that you need it.

Why can't we use a UNION instead of a UNION ALL in a recursive query? It seems like it would be a good idea to not include duplicates upon deeper recursion, doesn't it? Something like that should already be working under the hood already, I would think.

324

asked Dec 27 '17 21:12

Jeremy Holovacs

3 Answers

I presume the reason is that they just haven't considered this a priority feature worth implementing. It looks like Postgres does support both UNION and UNION ALL.

If you have a strong case for this feature you can provide feedback at Connect (or whatever the URL of its replacement will be).

Preventing duplicates being added could be useful as a duplicate row added in a later step to a previous one will nearly always end up causing an infinite loop or exceeding the max recursion limit.

There are quite a few places in the SQL Standards where code is used demonstrating UNION such as below

enter image description here

This article explains how they are implemented in SQL Server. They aren't doing anything like that "under the hood". The stack spool deletes rows as it goes so it wouldn't be possible to know if a later row is a duplicate of a deleted one. Supporting UNION would need a somewhat different approach.

In the meantime you can quite easily achieve the same in a multi statement TVF.

To take a silly example below (Postgres Fiddle)

WITH R
     AS (SELECT 0 AS N
         UNION
         SELECT ( N + 1 )%10
         FROM   R)
SELECT N
FROM   R

Changing the UNION to UNION ALL and adding a DISTINCT at the end won't save you from the infinite recursion.

But you can implement this as

CREATE FUNCTION dbo.F ()
RETURNS @R TABLE(n INT PRIMARY KEY WITH (IGNORE_DUP_KEY = ON))
AS
  BEGIN
      INSERT INTO @R
      VALUES      (0); --anchor

      WHILE @@ROWCOUNT > 0
        BEGIN
            INSERT INTO @R
            SELECT ( N + 1 )%10
            FROM   @R
        END

      RETURN
  END

GO

SELECT *
FROM   dbo.F ()

The above uses IGNORE_DUP_KEY to discard duplicates. If the column list is too wide to be indexed you would need DISTINCT and NOT EXISTS instead. You'd also probably want a parameter to set the max number of recursions and avoid infinite loops.

140

answered Oct 21 '22 02:10

Martin Smith

A good explanation of pred post speculation here : https://sqlite.org/lang_with.html :

Optimization note: ...... Very little memory is needed to run the above example. However, if the example had used UNION instead of UNION ALL, then SQLite would have had to keep around all previously generated content in order to check for duplicates. For this reason, programmers should strive to use UNION ALL instead of UNION when feasible.

answered Oct 21 '22 03:10

B. M.

This is pure speculation, but I would say, that the UNION ALL ensures, that the result of each iteration can be calculated individually. Essentially it ensures, that an iteration cannot interfere with another.

A UNION would require a sort operation in the background which might modify the result of previous iterations. The program should not change the state of a previous call in the call stack, it should interact with it using input parameters and the result of the subsequent iteration (in a procedural setting). This probably should apply to set based operations, thus to SQL Server's recursive CTEs.

I might be wrong, late night brain-dumps are not 100% reliable :)

Edit (just another thought):

When a recursion starts, you have a call stack. Each level in this stack starts calculating it's result, but should wait for the result of all subsequent calls before it can finish and return it's result. UNION would try to eliminate duplication, but you don't have any records until you reach the termination condition (and the final would be built from the bottom to the top), but the result of the subsequent call is required by the ones above it. The UNION would be reduced to a DISTINCT at the very end.

answered Oct 21 '22 04:10

Pred

Related questions
                            
                                SQL Server ALTER field NOT NULL takes forever
                            
                                SQL Server 2005: Why Name Transactions?
                            
                                order by a parameter
                            
                                Inverse of COALESCE
                            
                                how to retrieve a non sa password in SQL Server?
                            
                                Linq-to-sql failing on insert and update when this is a trigger attached
                            
                                SQL query on multiple databases
                            
                                How to design Date-of-Birth in DB and ORM for mix of known and unknown date parts
                            
                                how to query SQL Server via REST to get XML
                            
                                Is there a way to turn off implicit type conversion in SQL Server?
                            
                                Updating in a many-to-many relationship
                            
                                SQL Server MERGE + Joining other tables
                            
                                TABLESAMPLE returns wrong number of rows?
                            
                                Convert a .bak file to .sql file
                            
                                Why does SUM(...) on an empty recordset return NULL instead of 0?
                            
                                Lucene and SQL Server - best practice
                            
                                Correct SQL index for Partition + Order to remove SORT
                            
                                SQL Azure Geo Replication for non-redunancy purposes
                            
                                Merge multiple rows with same ID into one row
                            
                                Sql Server Transaction Commit times out

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With