Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hierarchical sum in PostgreSQL

This is a simplified version of a problem I am encountering in PostgreSQL.

I have the following table A:

[ ID INTEGER | VALUE NUMERIC(10,2) | PARENT INTEGER ]

Where 'PARENT' is a self-referencing FK to column ID.

The table definition is:

CREATE TABLE A(ID INTEGER IDENTITY, VALUE NUMERIC(10,2), PARENT INTEGER)                                    
ALTER  TABLE A ADD CONSTRAINT FK FOREIGN KEY (PARENT) REFERENCES A(ID) 

This simple table allows one to define tree data structures of arbitrary depth. Now I need to write a SQL (I prefer not to use server-side PL-SQL) that reports for each node, the total value of the sub-tree "hanging" under it. For instance, with the following table:

|  ID  | VALUE | PARENT |
-------------------------
|   1  | NULL  | NULL   |
|   2  | 3.50  |    1   |
|   3  | NULL  | NULL   |
|   4  | NULL  |    3   |
|   5  | 1.50  |    4   |
|   6  | 2.20  |    4   |

I should get the following result set:

| ID  |  Total-Value-of-Subtree |
|  1  |                  3.50   |
|  2  |                  3.50   |
|  3  |                  3.70   |
|  4  |                  3.70   |
|  5  |                  1.50   |
|  6  |                  2.20   |

For simplicitly, you can assume that only leaf nodes have values, non-leaf nodes always have a value of NULL in the VALUE column. Is there a way to do this in SQL, even utilizing PostgreSQL-specific extensions?

like image 293
Marcus Junius Brutus Avatar asked Nov 02 '12 08:11

Marcus Junius Brutus


People also ask

How do I create a hierarchical query in PostgreSQL?

CONNECT BY, PRIOR and START WITH in Oracle In Oracle, the hierarchical query is defined using the two mandatory keywords i.e. CONNECT BY and START WITH. The hierarchy is built when the CONNECT BY defines the relationship between parent and child, the PRIOR keyword used with CONNECT by specifies the parent.

Can we use CTE in PostgreSQL?

In PostgreSQL, the CTE(Common Table Expression) is used as a temporary result set that the user can reference within another SQL statement like SELECT, INSERT, UPDATE or DELETE. CTEs are temporary in the sense that they only exist during the execution of the query.

How recursive query works in PostgreSQL?

PostgreSQL executes a recursive CTE in the following sequence: Execute the non-recursive term to create the base result set (R0). Execute recursive term with Ri as an input to return the result set Ri+1 as the output. Repeat step 2 until an empty set is returned.


2 Answers

Maybe something like this:

WITH RECURSIVE CTE
AS
(
    SELECT
        t.ID,
        t.VALUE,
        t.PARENT
    FROM
        t
    WHERE NOT EXISTS
        (
            SELECT NULL FROM t AS t2 WHERE t2.PARENT=t.ID
        )
    UNION ALL
    SELECT
        t.ID,
        COALESCE(t.VALUE,CTE.VALUE),
        t.PARENT
    FROM
        t
        JOIN CTE
            ON CTE.PARENT=t.ID
)
SELECT
    CTE.ID,
    SUM(CTE.VALUE)
FROM
    CTE
GROUP BY
    CTE.ID
ORDER BY 
    ID;

This will start with the children that has no children. Then go up the tree to the parents. The result will be like this:

1   3.50
2   3.50
3   3.70
4   3.70
5   1.50
6   2.20
like image 31
Arion Avatar answered Oct 06 '22 19:10

Arion


In PostgreSQL you can use recursive CTEs (Common Table Expression) to walk trees in your queries.

Here are two relevant links into the docs:

  • Syntax
  • Examples

EDIT

Since there is no subselect required it might run a little better on a larger dataset than Arion's query.

WITH RECURSIVE children AS (
    -- select leaf nodes
    SELECT id, value, parent
        FROM t
        WHERE value IS NOT NULL
    UNION ALL
    -- propagate values of leaf nodes up, adding rows 
    SELECT t.id, children.value, t.parent
        FROM children JOIN t ON children.parent = t.id
)
SELECT id, sum(value) 
    FROM children 
    GROUP BY id   -- sum up appropriate rows
    ORDER BY id;
like image 127
A.H. Avatar answered Oct 06 '22 20:10

A.H.