Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Joining arrays within group by clause

We have a problem grouping arrays into a single array. We want to join the values from two columns into one single array and aggregate these arrays of multiple rows.

Given the following input:

| id | name | col_1 | col_2 |
| 1  |  a   |   1   |   2   |
| 2  |  a   |   3   |   4   |
| 4  |  b   |   7   |   8   |
| 3  |  b   |   5   |   6   |

We want the following output:

| a | { 1, 2, 3, 4 } |
| b | { 5, 6, 7, 8 } |

The order of the elements is important and should correlate with the id of the aggregated rows.

We tried the array_agg() function:

SELECT array_agg(ARRAY[col_1, col_2]) FROM mytable GROUP BY name;

Unfortunately, this statement raises an error:

ERROR: could not find array type for data type character varying[]

It seems to be impossible to merge arrays in a group by clause using array_agg().

Any ideas?

like image 517
tbz Avatar asked Jul 03 '14 15:07

tbz


People also ask

How do I join an array in SQL?

This is the function to use if you want to concatenate all the values in an array field into one string value. You can specify an optional argument as a separator, and it can be any string. If you do not specify a separator, there will be nothing aded between the values.

What is Array_agg in SQL?

PostgreSQL ARRAY_AGG() function is an aggregate function that accepts a set of values and returns an array where each value in the input set is assigned to an element of the array. Syntax: ARRAY_AGG(expression [ORDER BY [sort_expression {ASC | DESC}], [...]) The ORDER BY clause is an voluntary clause.

How does the GROUP BY clause work?

The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.


2 Answers

UNION ALL

You could "unpivot" with UNION ALL first:

SELECT name, array_agg(c) AS c_arr
FROM  (
   SELECT name, id, 1 AS rnk, col1 AS c FROM tbl
   UNION ALL
   SELECT name, id, 2, col2 FROM tbl
   ORDER  BY name, id, rnk
   ) sub
GROUP  BY 1;

Adapted to produce the order of values you later requested. The manual:

The aggregate functions array_agg, json_agg, string_agg, and xmlagg, as well as similar user-defined aggregate functions, produce meaningfully different result values depending on the order of the input values. This ordering is unspecified by default, but can be controlled by writing an ORDER BY clause within the aggregate call, as shown in Section 4.2.7. Alternatively, supplying the input values from a sorted subquery will usually work.

Bold emphasis mine.

LATERAL subquery with VALUES expression

LATERAL requires Postgres 9.3 or later.

SELECT t.name, array_agg(c) AS c_arr
FROM  (SELECT * FROM tbl ORDER BY name, id) t
CROSS  JOIN LATERAL (VALUES (t.col1), (t.col2)) v(c)
GROUP  BY 1;

Same result. Only needs a single pass over the table.

Custom aggregate function

Or you could create a custom aggregate function like discussed in these related answers:

  • Selecting data into a Postgres array
  • Is there something like a zip() function in PostgreSQL that combines two arrays?
CREATE AGGREGATE array_agg_mult (anyarray)  (
    SFUNC     = array_cat
  , STYPE     = anyarray
  , INITCOND  = '{}'
);

Then you can:

SELECT name, array_agg_mult(ARRAY[col1, col2] ORDER BY id) AS c_arr
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Or, typically faster, while not standard SQL:

SELECT name, array_agg_mult(ARRAY[col1, col2]) AS c_arr
FROM  (SELECT * FROM tbl ORDER BY name, id) t
GROUP  BY 1;

The added ORDER BY id (which can be appended to such aggregate functions) guarantees your desired result:

a | {1,2,3,4}
b | {5,6,7,8}

Or you might be interested in this alternative:

SELECT name, array_agg_mult(ARRAY[ARRAY[col1, col2]] ORDER BY id) AS c_arr
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Which produces 2-dimensional arrays:

a | {{1,2},{3,4}}
b | {{5,6},{7,8}}

The last one can be replaced (and should be, as it's faster!) with the built-in array_agg() in Postgres 9.5 or later - with its added capability of aggregating arrays:

SELECT name, array_agg(ARRAY[col1, col2] ORDER BY id) AS c_arr
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Same result. The manual:

input arrays concatenated into array of one higher dimension (inputs must all have same dimensionality, and cannot be empty or null)

So not exactly the same as our custom aggregate function array_agg_mult();

like image 170
Erwin Brandstetter Avatar answered Oct 14 '22 02:10

Erwin Brandstetter


select n, array_agg(c) as c
from (
    select n, unnest(array[c1, c2]) as c
    from t
) s
group by n

Or simpler

select
    n,
    array_agg(c1) || array_agg(c2) as c
from t
group by n

To address the new ordering requirement:

select n, array_agg(c order by id, o) as c
from (
    select
        id, n,
        unnest(array[c1, c2]) as c,
        unnest(array[1, 2]) as o
    from t
) s
group by n
like image 42
Clodoaldo Neto Avatar answered Oct 14 '22 04:10

Clodoaldo Neto