I have a simple question. Somehow I was unable to find a definitive answer.
How much is WITH RECURSIVE
syntax optimized in PostgreSQL? By that I mean: is it merely a syntactic sugar for a series of non recursive queries, OR is it more of a single statement that despite its complicated semantics has been optimized as a whole. A follow-up question - just about how much is it possible to optimize this kind of syntax? Of course some concrete data on the matter is most welcome.
The recursive term can be one or more CTE query definitions joined with the non-recursive term through the UNION or UNION ALL operator. The recursive term references the CTE name itself. The recursion stops when no rows are returned from the previous iteration.
PostgreSQL conforms to the ANSI SQL-99 standard and implementing CTEs in PostgreSQL is similar to SQL Server. CTE is also known as WITH query. This type of query helps you to simplify long queries, it is similar to defining temporary tables that exist only for the running of the query.
CONNECT BY, PRIOR and START WITH in Oracle In Oracle, the hierarchical query is defined using the two mandatory keywords i.e. CONNECT BY and START WITH. The hierarchy is built when the CONNECT BY defines the relationship between parent and child, the PRIOR keyword used with CONNECT by specifies the parent.
I've found it optimized up to a point.
The various subqueries are re-used as expected and are optimized individually, and Postgres optimizes the latter just like any other query.
My main gripe with it has to do with that it won't inject constraints into the CTEs when it could.
For instance:
with recursive
parents as (
select node.id,
node.parent_id
from nodes as node
union all
select node.id,
parent.parent_id
from parents as node
join nodes as parent on parent.id = node.parent_id
)
select parent_id
from parents
where id = 2;
Postgres would ideally understand, in the above, that (since node.id is returned as is) it can do:
with recursive
parents as (
select node.id,
node.parent_id
from nodes as node
where id = 2
union all
select node.id,
parent.parent_id
from parents as node
join nodes as parent on parent.id = node.parent_id
)
select parent_id
from parents;
... and use an index scan on the primary key. In practice, it'll actually do exactly when the CTE tells it to do: recursively pull all parents for all rows, place the result set in an unnamed temporary table if needed, and then check each row from the result set one for id = 2.
In other words, a CTE does not keep a trace of the "originating" table/row/column set that it's returning. Until this gets optimized properly, creating a view on a recursive query is crazy at best.
A good workaround in the meanwhile is to create an sql function instead:
create function parents(id int) as returns table (id int) $$
with recursive
parents as (
select node.id,
node.parent_id
from nodes as node
where id = $1
union all
select node.id,
parent.parent_id
from parents as node
join nodes as parent on parent.id = node.parent_id
)
select parent_id
from parents;
$$ language sql stable strict rows 5 cost 1;
Another issue is you can't use FOR UPDATE with recursive CTEs (for very much the same reason, in fact).
My experience is that it is indeed very well optimized.
Check out the execution plan for your query generated by EXPLAIN ANALYZE and you'll see how "costly" it really is (and then compare that e.g. to a self written recursive function)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With