We have a problem grouping arrays into a single array. We want to join the values from two columns into one single array and aggregate these arrays of multiple rows. Given the following input: <pre class="prettyprint"><code>| id | name | col_1 | col_2 | | 1 | a | 1 | 2 | | 2 | a | 3 | 4 | | 4 | b | 7 | 8 | | 3 | b | 5 | 6 | </code></pre> We want the following output: <pre class="prettyprint"><code>| a | { 1, 2, 3, 4 } | | b | { 5, 6, 7, 8 } | </code></pre> The order of the elements is important and should correlate with the id of the aggregated rows. We tried the <code>array_agg()</code> function: <pre class="prettyprint"><code>SELECT array_agg(ARRAY[col_1, col_2]) FROM mytable GROUP BY name; </code></pre> Unfortunately, this statement raises an error: <blockquote> <pre class="prettyprint"><code>ERROR: could not find array type for data type character varying[] </code></pre> </blockquote> It seems to be impossible to merge arrays in a group by clause using <code>array_agg()</code>. Any ideas?

<h3><code>UNION ALL</code></h3> You could "unpivot" with <code>UNION ALL</code> first: <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT name, array_agg(c) AS c_arr FROM ( SELECT name, id, 1 AS rnk, col1 AS c FROM tbl UNION ALL SELECT name, id, 2, col2 FROM tbl ORDER BY name, id, rnk ) sub GROUP BY 1; </code></pre> Adapted to produce the order of values you later requested. The manual: <blockquote> The aggregate functions <code>array_agg</code>, <code>json_agg</code>, <code>string_agg</code>, and <code>xmlagg</code>, as well as similar user-defined aggregate functions, produce meaningfully different result values depending on the order of the input values. This ordering is unspecified by default, but can be controlled by writing an <code>ORDER BY</code> clause within the aggregate call, as shown in Section 4.2.7. Alternatively, supplying the input values from a sorted subquery will usually work. </blockquote> Bold emphasis mine. <h3> <code>LATERAL</code> subquery with <code>VALUES</code> expression</h3> <code>LATERAL</code> requires Postgres 9.3 or later. <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT t.name, array_agg(c) AS c_arr FROM (SELECT * FROM tbl ORDER BY name, id) t CROSS JOIN LATERAL (VALUES (t.col1), (t.col2)) v(c) GROUP BY 1; </code></pre> Same result. Only needs a single pass over the table. <h3>Custom aggregate function</h3> Or you could create a custom aggregate function like discussed in these related answers: <ul> <li>Selecting data into a Postgres array</li> <li>Is there something like a zip() function in PostgreSQL that combines two arrays?</li> </ul> <pre class="prettyprint lang-sql prettyprint-override"><code>CREATE AGGREGATE array_agg_mult (anyarray) ( SFUNC = array_cat , STYPE = anyarray , INITCOND = '{}' ); </code></pre> Then you can: <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT name, array_agg_mult(ARRAY[col1, col2] ORDER BY id) AS c_arr FROM tbl GROUP BY 1 ORDER BY 1; </code></pre> Or, typically faster, while not standard SQL: <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT name, array_agg_mult(ARRAY[col1, col2]) AS c_arr FROM (SELECT * FROM tbl ORDER BY name, id) t GROUP BY 1; </code></pre> The added <code>ORDER BY id</code> (which can be appended to such aggregate functions) guarantees your desired result: <pre class="prettyprint"><code>a | {1,2,3,4} b | {5,6,7,8} </code></pre> Or you might be interested in this alternative: <pre class="prettyprint"><code>SELECT name, array_agg_mult(ARRAY[ARRAY[col1, col2]] ORDER BY id) AS c_arr FROM tbl GROUP BY 1 ORDER BY 1; </code></pre> Which produces 2-dimensional arrays: <pre class="prettyprint"><code>a | {{1,2},{3,4}} b | {{5,6},{7,8}} </code></pre> The last one can be replaced (and should be, as it's faster!) with the built-in <code>array_agg()</code> in Postgres 9.5 or later - with its added capability of aggregating arrays: <pre class="prettyprint"><code>SELECT name, array_agg(ARRAY[col1, col2] ORDER BY id) AS c_arr FROM tbl GROUP BY 1 ORDER BY 1; </code></pre> Same result. The manual: <blockquote> input arrays concatenated into array of one higher dimension (inputs must all have same dimensionality, and cannot be empty or null) </blockquote> So not exactly the same as our custom aggregate function <code>array_agg_mult()</code>;

<pre class="prettyprint"><code>select n, array_agg(c) as c from ( select n, unnest(array[c1, c2]) as c from t ) s group by n </code></pre> Or simpler <pre class="prettyprint"><code>select n, array_agg(c1) || array_agg(c2) as c from t group by n </code></pre> <hr> To address the new ordering requirement: <pre class="prettyprint"><code>select n, array_agg(c order by id, o) as c from ( select id, n, unnest(array[c1, c2]) as c, unnest(array[1, 2]) as o from t ) s group by n </code></pre>

Joining arrays within group by clause

Tags:

arrays

sql

postgresql

postgresql-9.1

group-by

We have a problem grouping arrays into a single array. We want to join the values from two columns into one single array and aggregate these arrays of multiple rows.

Given the following input:

| id | name | col_1 | col_2 |
| 1  |  a   |   1   |   2   |
| 2  |  a   |   3   |   4   |
| 4  |  b   |   7   |   8   |
| 3  |  b   |   5   |   6   |

We want the following output:

| a | { 1, 2, 3, 4 } |
| b | { 5, 6, 7, 8 } |

The order of the elements is important and should correlate with the id of the aggregated rows.

We tried the array_agg() function:

SELECT array_agg(ARRAY[col_1, col_2]) FROM mytable GROUP BY name;

Unfortunately, this statement raises an error:

ERROR: could not find array type for data type character varying[]

It seems to be impossible to merge arrays in a group by clause using array_agg().

Any ideas?

517

asked Jul 03 '14 15:07

tbz

2 Answers

`UNION ALL`

You could "unpivot" with UNION ALL first:

SELECT name, array_agg(c) AS c_arr
FROM  (
   SELECT name, id, 1 AS rnk, col1 AS c FROM tbl
   UNION ALL
   SELECT name, id, 2, col2 FROM tbl
   ORDER  BY name, id, rnk
   ) sub
GROUP  BY 1;

Adapted to produce the order of values you later requested. The manual:

The aggregate functions array_agg, json_agg, string_agg, and xmlagg, as well as similar user-defined aggregate functions, produce meaningfully different result values depending on the order of the input values. This ordering is unspecified by default, but can be controlled by writing an ORDER BY clause within the aggregate call, as shown in Section 4.2.7. Alternatively, supplying the input values from a sorted subquery will usually work.

Bold emphasis mine.

`LATERAL` subquery with `VALUES` expression

LATERAL requires Postgres 9.3 or later.

SELECT t.name, array_agg(c) AS c_arr
FROM  (SELECT * FROM tbl ORDER BY name, id) t
CROSS  JOIN LATERAL (VALUES (t.col1), (t.col2)) v(c)
GROUP  BY 1;

Same result. Only needs a single pass over the table.

Custom aggregate function

Or you could create a custom aggregate function like discussed in these related answers:

Selecting data into a Postgres array
Is there something like a zip() function in PostgreSQL that combines two arrays?

CREATE AGGREGATE array_agg_mult (anyarray)  (
    SFUNC     = array_cat
  , STYPE     = anyarray
  , INITCOND  = '{}'
);

Then you can:

SELECT name, array_agg_mult(ARRAY[col1, col2] ORDER BY id) AS c_arr
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Or, typically faster, while not standard SQL:

SELECT name, array_agg_mult(ARRAY[col1, col2]) AS c_arr
FROM  (SELECT * FROM tbl ORDER BY name, id) t
GROUP  BY 1;

The added ORDER BY id (which can be appended to such aggregate functions) guarantees your desired result:

a | {1,2,3,4}
b | {5,6,7,8}

Or you might be interested in this alternative:

SELECT name, array_agg_mult(ARRAY[ARRAY[col1, col2]] ORDER BY id) AS c_arr
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Which produces 2-dimensional arrays:

a | {{1,2},{3,4}}
b | {{5,6},{7,8}}

The last one can be replaced (and should be, as it's faster!) with the built-in array_agg() in Postgres 9.5 or later - with its added capability of aggregating arrays:

SELECT name, array_agg(ARRAY[col1, col2] ORDER BY id) AS c_arr
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Same result. The manual:

input arrays concatenated into array of one higher dimension (inputs must all have same dimensionality, and cannot be empty or null)

So not exactly the same as our custom aggregate function array_agg_mult();

170

answered Oct 14 '22 02:10

Erwin Brandstetter

select n, array_agg(c) as c
from (
    select n, unnest(array[c1, c2]) as c
    from t
) s
group by n

Or simpler

select
    n,
    array_agg(c1) || array_agg(c2) as c
from t
group by n

To address the new ordering requirement:

select n, array_agg(c order by id, o) as c
from (
    select
        id, n,
        unnest(array[c1, c2]) as c,
        unnest(array[1, 2]) as o
    from t
) s
group by n

answered Oct 14 '22 04:10

Clodoaldo Neto

Related questions
                            
                                SQL: join table showing null value as well
                            
                                export gridview to excel with custom value formatting
                            
                                Dummy where clauses effects on performance
                            
                                Multiple INNER JOIN with GROUP BY and Aggregate Function
                            
                                SQL query with comments import into R from file
                            
                                Now() vs GetDate()
                            
                                How to turn a huge live database into a small testing database?
                            
                                MSSQL BIT_COUNT (Hammingdistance)
                            
                                sqlConnection/Command using statement + try/catch block [duplicate]
                            
                                DQL query to return all files in a Cabinet in Documentum?
                            
                                SQL (+)= definition and function
                            
                                How to join multiple tables by date range in SQL?
                            
                                mysql grant select privilege on only one table and some columns of it
                            
                                Can not determine what the WHERE clause should be
                            
                                How to normalize data efficently while INSERTing into SQL table (Postgres)
                            
                                MySQL select all dates that are an increment of x days
                            
                                Select n amount of random rows where n is proportionate to each value's % of total population
                            
                                How can I optimize SQLite ORDER BY rowid?
                            
                                Backup and restore of Hsqldb database in java code
                            
                                Is there any formal difference at all between PostgreSQL functions with OUT parameters and with TABLE results?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Joining arrays within group by clause

Tags:

arrays

sql

postgresql

postgresql-9.1

group-by

tbz

People also ask

2 Answers

`UNION ALL`

`LATERAL` subquery with `VALUES` expression

Custom aggregate function

Erwin Brandstetter

Clodoaldo Neto

Recent Activity

Donate For Us

Joining arrays within group by clause

Tags:

arrays

sql

postgresql

postgresql-9.1

group-by

tbz

People also ask

2 Answers

UNION ALL

LATERAL subquery with VALUES expression

Custom aggregate function

Erwin Brandstetter

Clodoaldo Neto

Related questions

Recent Activity

Donate For Us

`UNION ALL`

`LATERAL` subquery with `VALUES` expression