Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL WITH RECURSIVE performance

I have a simple question. Somehow I was unable to find a definitive answer.

How much is WITH RECURSIVE syntax optimized in PostgreSQL? By that I mean: is it merely a syntactic sugar for a series of non recursive queries, OR is it more of a single statement that despite its complicated semantics has been optimized as a whole. A follow-up question - just about how much is it possible to optimize this kind of syntax? Of course some concrete data on the matter is most welcome.

like image 671
julx Avatar asked May 02 '11 19:05

julx


People also ask

What is with recursive in PostgreSQL?

The recursive term can be one or more CTE query definitions joined with the non-recursive term through the UNION or UNION ALL operator. The recursive term references the CTE name itself. The recursion stops when no rows are returned from the previous iteration.

Does PostgreSQL support CTE?

PostgreSQL conforms to the ANSI SQL-99 standard and implementing CTEs in PostgreSQL is similar to SQL Server. CTE is also known as WITH query. This type of query helps you to simplify long queries, it is similar to defining temporary tables that exist only for the running of the query.

How do I create a hierarchical query in PostgreSQL?

CONNECT BY, PRIOR and START WITH in Oracle In Oracle, the hierarchical query is defined using the two mandatory keywords i.e. CONNECT BY and START WITH. The hierarchy is built when the CONNECT BY defines the relationship between parent and child, the PRIOR keyword used with CONNECT by specifies the parent.


2 Answers

I've found it optimized up to a point.

The various subqueries are re-used as expected and are optimized individually, and Postgres optimizes the latter just like any other query.

My main gripe with it has to do with that it won't inject constraints into the CTEs when it could.

For instance:

with recursive
parents as (
select node.id,
       node.parent_id
from nodes as node
union all
select node.id,
       parent.parent_id
from parents as node
join nodes as parent on parent.id = node.parent_id
)
select parent_id
from parents
where id = 2;

Postgres would ideally understand, in the above, that (since node.id is returned as is) it can do:

with recursive
parents as (
select node.id,
       node.parent_id
from nodes as node
where id = 2
union all
select node.id,
       parent.parent_id
from parents as node
join nodes as parent on parent.id = node.parent_id
)
select parent_id
from parents;

... and use an index scan on the primary key. In practice, it'll actually do exactly when the CTE tells it to do: recursively pull all parents for all rows, place the result set in an unnamed temporary table if needed, and then check each row from the result set one for id = 2.

In other words, a CTE does not keep a trace of the "originating" table/row/column set that it's returning. Until this gets optimized properly, creating a view on a recursive query is crazy at best.

A good workaround in the meanwhile is to create an sql function instead:

create function parents(id int) as returns table (id int) $$
    with recursive
    parents as (
    select node.id,
           node.parent_id
    from nodes as node
    where id = $1
    union all
    select node.id,
           parent.parent_id
    from parents as node
    join nodes as parent on parent.id = node.parent_id
    )
    select parent_id
    from parents;
$$ language sql stable strict rows 5 cost 1;

Another issue is you can't use FOR UPDATE with recursive CTEs (for very much the same reason, in fact).

like image 141
Denis de Bernardy Avatar answered Oct 20 '22 06:10

Denis de Bernardy


My experience is that it is indeed very well optimized.

Check out the execution plan for your query generated by EXPLAIN ANALYZE and you'll see how "costly" it really is (and then compare that e.g. to a self written recursive function)

like image 42
a_horse_with_no_name Avatar answered Oct 20 '22 06:10

a_horse_with_no_name