Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select using WITH RECURSIVE clause [closed]

I have googled and read throug some articles like this postgreSQL manual page or this blog page and tried making queries myself with a moderate success (part of them hangs, while others works good and fast), but so far I can not completely understand how this magic works.

Can anybody give very clear explanation demonstrating such query semantics and execution process, better based on typical samples like factorial calculation or full tree expansion from (id,parent_id,name) table?

And what are the basic guidlines and typical mistakes that one should know to make good with recursive queries?

like image 919
user2754703 Avatar asked Sep 06 '13 14:09

user2754703


People also ask

What is a recursive WITH clause?

A WITH clause that includes the RECURSIVE option iterates over its own output through repeated execution of a UNION or UNION ALL query. Recursive queries are useful when working with self-referential data—hierarchies such as manager-subordinate relationships, or tree-structured data such as taxonomies.

How do you write a recursive query in DB2?

DB2® for i provides two ways of defining a recursive query. The first one is called a hierarchical query which uses the CONNECT BY clause to define how a parent row is to be associated with its child rows. The second method is to use a recursive common table expression.

How does recursive CTE works in SQL Server?

A recursive CTE references itself. It returns the result subset, then it repeatedly (recursively) references itself, and stops when it returns all the results. FROM cte_name; Again, at the beginning of your CTE is the WITH clause.


1 Answers

First of all, let us try to simplify and clarify algorithm description given on the manual page. To simplify it consider only union all in with recursive clause for now (and union later):

WITH RECURSIVE pseudo-entity-name(column-names) AS (     Initial-SELECT UNION ALL     Recursive-SELECT using pseudo-entity-name ) Outer-SELECT using pseudo-entity-name 

To clarify it let us describe query execution process in pseudo code:

working-recordset = result of Initial-SELECT  append working-recordset to empty outer-recordset  while( working-recordset is not empty ) begin      new working-recordset = result of Recursive-SELECT          taking previous working-recordset as pseudo-entity-name      append working-recordset to outer-recordset  end  overall-result = result of Outer-SELECT      taking outer-recordset as pseudo-entity-name 

Or even shorter - Database engine executes initial select, taking its result rows as working set. Then it repeatedly executes recursive select on the working set, each time replacing contents of the working set with query result obtained. This process ends when empty set is returned by recursive select. And all result rows given firstly by initial select and then by recursive select are gathered and feeded to outer select, which result becomes overall query result.

This query is calculating factorial of 3:

WITH RECURSIVE factorial(F,n) AS (     SELECT 1 F, 3 n UNION ALL     SELECT F*n F, n-1 n from factorial where n>1 ) SELECT F from factorial where n=1 

Initial select SELECT 1 F, 3 n gives us initial values: 3 for argument and 1 for function value.
Recursive select SELECT F*n F, n-1 n from factorial where n>1 states that every time we need to multiply last funcion value by last argument value and decrement argument value.
Database engine executes it like this:

First of all it executes initail select, which gives the initial state of working recordset:

F | n --+-- 1 | 3 

Then it transforms working recordset with recursive query and obtain its second state:

F | n --+-- 3 | 2 

Then third state:

F | n --+-- 6 | 1 

In the third state there is no row which follows n>1 condition in recursive select, so forth working set is loop exits.

Outer recordset now holds all the rows, returned by initial and recursive select:

F | n --+-- 1 | 3 3 | 2 6 | 1 

Outer select filters out all intermediate results from outer recordset, showing only final factorial value which becomes overall query result:

F  -- 6 

And now let us consider table forest(id,parent_id,name):

id | parent_id | name ---+-----------+----------------- 1  |           | item 1 2  | 1         | subitem 1.1 3  | 1         | subitem 1.2 4  | 1         | subitem 1.3 5  | 3         | subsubitem 1.2.1 6  |           | item 2 7  | 6         | subitem 2.1 8  |           | item 3 

'Expanding full tree' here means sorting tree items in human-readable depth-first order while calculating their levels and (maybe) paths. Both tasks (of correct sorting and calculating level or path) are not solvable in one (or even any constant number of) SELECT without using WITH RECURSIVE clause (or Oracle CONNECT BY clause, which is not supported by PostgreSQL). But this recursive query does the job (well, almost does, see the note below):

WITH RECURSIVE fulltree(id,parent_id,level,name,path) AS (     SELECT id, parent_id, 1 as level, name, name||'' as path from forest where parent_id is null UNION ALL     SELECT t.id, t.parent_id, ft.level+1 as level, t.name, ft.path||' / '||t.name as path     from forest t, fulltree ft where t.parent_id = ft.id ) SELECT * from fulltree order by path 

Database engine executes it like this:

Firstly, it executes initail select, which gives all highest level items (roots) from forest table:

id | parent_id | level | name             | path ---+-----------+-------+------------------+---------------------------------------- 1  |           | 1     | item 1           | item 1 8  |           | 1     | item 3           | item 3 6  |           | 1     | item 2           | item 2 

Then, it executes recursive select, which gives all 2nd level items from forest table:

id | parent_id | level | name             | path ---+-----------+-------+------------------+---------------------------------------- 2  | 1         | 2     | subitem 1.1      | item 1 / subitem 1.1 3  | 1         | 2     | subitem 1.2      | item 1 / subitem 1.2 4  | 1         | 2     | subitem 1.3      | item 1 / subitem 1.3 7  | 6         | 2     | subitem 2.1      | item 2 / subitem 2.1 

Then, it executes recursive select again, retrieving 3d level items:

id | parent_id | level | name             | path ---+-----------+-------+------------------+---------------------------------------- 5  | 3         | 3     | subsubitem 1.2.1 | item 1 / subitem 1.2 / subsubitem 1.2.1 

And now it executes recursive select again, trying to retrieve 4th level items, but there are none of them, so the loop exits.

The outer SELECT sets the correct human-readable row order, sorting on path column:

id | parent_id | level | name             | path ---+-----------+-------+------------------+---------------------------------------- 1  |           | 1     | item 1           | item 1 2  | 1         | 2     | subitem 1.1      | item 1 / subitem 1.1 3  | 1         | 2     | subitem 1.2      | item 1 / subitem 1.2 5  | 3         | 3     | subsubitem 1.2.1 | item 1 / subitem 1.2 / subsubitem 1.2.1 4  | 1         | 2     | subitem 1.3      | item 1 / subitem 1.3 6  |           | 1     | item 2           | item 2 7  | 6         | 2     | subitem 2.1      | item 2 / subitem 2.1 8  |           | 1     | item 3           | item 3 

NOTE: Resulting row order will remain correct only while there are no punctuation characters collation-preceeding / in the item names. If we rename Item 2 in Item 1 *, it will break row order, standing between Item 1 and its descendants.
More stable solution is using tab character (E'\t') as path separator in query (which can be substituted by more readable path separator later: in outer select, before displaing to human or etc). Tab separated paths will retain correct order until there are tabs or control characters in the item names - which easily can be checked and ruled out without loss of usability.

It is very simple to modify last query to expand any arbitrary subtree - you need only to substitute condition parent_id is null with perent_id=1 (for example). Note that this query variant will return all levels and paths relative to Item 1.

And now about typical mistakes. The most notable typical mistake specific to recursive queries is defining ill stop conditions in recursive select, which results in infinite looping.

For example, if we omit where n>1 condition in factorial sample above, execution of recursive select will never give an empty set (because we have no condition to filter out single row) and looping will continue infinitely.

That is the most probable reason why some of your queries hang (the other non-specific but still possible reason is very ineffective select, which executes in finite but very long time).

There are not much RECURSIVE-specific querying guidlines to mention, as far as I know. But I would like to suggest (rather obvious) step by step recursive query building procedure.

  • Separately build and debug your initial select.

  • Wrap it with scaffolding WITH RECURSIVE construct
    and begin building and debugging your recursive select.

The recommended scuffolding construct is like this:

WITH RECURSIVE rec( <Your column names> ) AS (     <Your ready and working initial SELECT> UNION ALL     <Recursive SELECT that you are debugging now> ) SELECT * from rec limit 1000 

This simplest outer select will output the whole outer recordset, which, as we know, contains all output rows from initial select and every execution of recusrive select in a loop in their original output order - just like in samples above! The limit 1000 part will prevent hanging, replacing it with oversized output in which you will be able to see the missed stop point.

  • After debugging initial and recursive select build and debug your outer select.

And now the last thing to mention - the difference in using union instead of union all in with recursive clause. It introduces row uniqueness constraint which results in two extra lines in our execution pseudocode:

working-recordset = result of Initial-SELECT  discard duplicate rows from working-recordset /*union-specific*/  append working-recordset to empty outer-recordset  while( working-recordset is not empty ) begin      new working-recordset = result of Recursive-SELECT          taking previous working-recordset as pseudo-entity-name      discard duplicate rows and rows that have duplicates in outer-recordset          from working-recordset /*union-specific*/      append working-recordset to outer-recordset  end  overall-result = result of Outer-SELECT      taking outer-recordset as pseudo-entity-name 
like image 139
mas.morozov Avatar answered Sep 22 '22 06:09

mas.morozov