Suppose we have (in PostgreSQL 9.1) a table with some identifier, a column of type integer[] and some other columns (at least one, although there might be more) of type integer (or any other which can be summed).
The goal is to have an aggregate giving for each identifier sum of the "summable" column and an array of all distinct elements of the array column.
The only way I can find is to use unnest function on the array column in a subquery and than join it with another subquery aggregating the "summable" columns.
A simple example is as follows:
CREATE TEMP TABLE a (id integer, aint integer[], summable_val integer);
INSERT INTO a VALUES
(1, array[1,2,3], 5),
(2, array[2,3,4], 6),
(3, array[3,4,5], 2),
(1, array[7,8,9], 19);
WITH u AS (
SELECT id, unnest(aint) as t FROM a GROUP BY 1,2
),
d AS (
SELECT id, array_agg(distinct t) ar FROM u GROUP BY 1),
v as (
SELECT id, sum(summable_val) AS val
FROM a GROUP BY 1
)
SELECT v.id, v.val, d.ar
FROM v
JOIN d
ON v.id = d.id;
The code above does what I intended but the question is can we do any better? Main drawback of this solution is that it reads and aggregate table twice which might be troublesome for larger tables.
Some other solution to the general problem is to avoid using the array column and agregate "summable" column for each array member and then use array_agg
in aggregation - but at least for now I'd like to stick to this array way.
Thanks in advance for any ideas.
You can use DISTINCT to eliminate duplicate values in aggregate function calculations.
To define a new aggregate function, one selects a data type for the state value, an initial value for the state, and a state transition function. The state transition function takes the previous state value and the aggregate's input value(s) for the current row, and returns a new state value.
The coalesce function can be used to substitute zero or an empty array for null when necessary. Here ANY can be considered either as introducing a subquery, or as being an aggregate function, if the subquery returns one row with a Boolean value.
Removing duplicate rows from a query result set in PostgreSQL can be done using the SELECT statement with the DISTINCT clause. It keeps one row for each group of duplicates. The DISTINCT clause can be used for a single column or for a list of columns.
The query may be a little bit faster (I suppose) but I cannot see any remarkable optimizations:
select a.id, sum(summable_val) val, ar
from
(select id, array_agg(distinct t) ar
from
(select id, unnest(aint) as t from a group by 1,2) u
group by 1) x
join a on x.id = a.id
group by 1,3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With