Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

array of distinct values aggregated from an array column in Postgres

Suppose we have (in PostgreSQL 9.1) a table with some identifier, a column of type integer[] and some other columns (at least one, although there might be more) of type integer (or any other which can be summed).

The goal is to have an aggregate giving for each identifier sum of the "summable" column and an array of all distinct elements of the array column.

The only way I can find is to use unnest function on the array column in a subquery and than join it with another subquery aggregating the "summable" columns.

A simple example is as follows:

CREATE TEMP TABLE a (id integer, aint integer[], summable_val integer);
INSERT INTO a VALUES
(1, array[1,2,3], 5),
(2, array[2,3,4], 6),
(3, array[3,4,5], 2),
(1, array[7,8,9], 19);

WITH u AS (
SELECT id, unnest(aint) as t FROM a GROUP BY 1,2
),
d AS (
SELECT id, array_agg(distinct t) ar FROM u GROUP BY 1),
v as (
SELECT id, sum(summable_val) AS val
FROM a GROUP BY 1
)
SELECT v.id, v.val, d.ar
FROM v
JOIN d
ON   v.id = d.id;

The code above does what I intended but the question is can we do any better? Main drawback of this solution is that it reads and aggregate table twice which might be troublesome for larger tables.

Some other solution to the general problem is to avoid using the array column and agregate "summable" column for each array member and then use array_agg in aggregation - but at least for now I'd like to stick to this array way.

Thanks in advance for any ideas.

like image 373
One Data Guy Avatar asked Feb 18 '13 11:02

One Data Guy


People also ask

Can we use aggregate function with distinct?

You can use DISTINCT to eliminate duplicate values in aggregate function calculations.

How do I create aggregate function in PostgreSQL?

To define a new aggregate function, one selects a data type for the state value, an initial value for the state, and a state transition function. The state transition function takes the previous state value and the aggregate's input value(s) for the current row, and returns a new state value.

Can coalesce be used as an aggregate function?

The coalesce function can be used to substitute zero or an empty array for null when necessary. Here ANY can be considered either as introducing a subquery, or as being an aggregate function, if the subquery returns one row with a Boolean value.

How do I get unique column values in PostgreSQL?

Removing duplicate rows from a query result set in PostgreSQL can be done using the SELECT statement with the DISTINCT clause. It keeps one row for each group of duplicates. The DISTINCT clause can be used for a single column or for a list of columns.


1 Answers

The query may be a little bit faster (I suppose) but I cannot see any remarkable optimizations:

select a.id, sum(summable_val) val, ar
from
    (select id, array_agg(distinct t) ar 
        from 
        (select id, unnest(aint) as t from a group by 1,2) u
    group by 1) x
    join a on x.id = a.id
group by 1,3
like image 165
klin Avatar answered Oct 08 '22 21:10

klin