Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ordering distinct column values by (first value of) other column in aggregate function

I'm trying to order the output order of some distinct aggregated text based on the value of another column with something like:

string_agg(DISTINCT sometext, ' ' ORDER BY numval)

However, that results in the error:

ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list

I do understand why this is, since the ordering would be "ill-defined" if the numval of two repeated values differs, with that of another lying in-between.

Ideally, I would like to order them by first appearance / lowest order-by value, but the ill-defined cases are actually rare enough in my data (it's mostly sequentially repeated values that I want to get rid of with the DISTINCT) that I ultimately don't particularly care about their ordering and would be happy with something like MySQL's GROUP_CONCAT(DISTINCT sometext ORDER BY numval SEPARATOR ' ') that simply works despite its sloppiness.

I expect some Postgres contortionism will be necessary, but I don't really know what the most efficient/concise way of going about this would be.

like image 920
Dologan Avatar asked Aug 07 '14 10:08

Dologan


2 Answers

Eliminate the need to do a distinct by pre aggregating

select string_agg(sometext, ' ' order by numval)
from (
    select sometext, min(numval) as numval
    from t
    group by sometext
) s

@Gordon's answer brought a good point. That is if there are other needed columns. In this case a distinct on is recommended

select x, string_agg(sometext, ' ' order by numval)
from (
    select distinct on (sometext) *
    from t
    order by sometext, numval
) s
group by x
like image 150
Clodoaldo Neto Avatar answered Sep 28 '22 07:09

Clodoaldo Neto


Building on DISTINCT ON

SELECT string_agg(sometext, ' ' ORDER BY numval) AS no_dupe
FROM  (
    SELECT DISTINCT ON (1,2) <whatever>, sometext, numval
    FROM   tbl
    ORDER  BY 1,2,3
    ) sub;

This is the simpler equivalent of @Gordon's query.
From your description alone I would have suggested @Clodoaldo's simpler variant.

uniq() for integer

For integer values instead of text, the additional module intarray has just the thing for you:

uniq(int[])     int[]   remove adjacent duplicates

Install it once per database with:

CREATE EXTENSION intarray;

Then the query is simply:

SELECT uniq(array_agg(some_int ORDER BY <whatever>, numval)) AS no_dupe
FROM   tbl;

Result is an array, wrap it in array_to_string() if you need a string. Related:

  • How to create an index for elements of an array in PostgreSQL?
  • Compare arrays for equality, ignoring order of elements

In fact, it wouldn't be hard to create a custom aggregate function to do the same with text ...

Custom aggregate function for any data type

Function that only adds next element to array if it is different from the previous. (NULL values are removed!):

CREATE OR REPLACE FUNCTION f_array_append_uniq (anyarray, anyelement)
  RETURNS anyarray
  LANGUAGE sql STRICT IMMUTABLE AS
'SELECT CASE WHEN $1[array_upper($1, 1)] = $2 THEN $1 ELSE $1 || $2 END';

Using polymorphic types to make it work for any scalar data-type. Custom aggregate function:

CREATE AGGREGATE array_agg_uniq(anyelement) (
   SFUNC = f_array_append_uniq
 , STYPE = anyarray
 , INITCOND = '{}'
);

Call:

SELECT array_to_string(
          array_agg_uniq(sometext ORDER BY <whatever>, numval)
        , ' ') AS no_dupe
FROM   tbl;

Note that the aggregate is PARALLEL UNSAFE (default) by nature, even though the transition function could be marked PARALLEL SAFE.

Related answer:

  • Custom PostgreSQL aggregate for circular average
like image 24
Erwin Brandstetter Avatar answered Sep 28 '22 07:09

Erwin Brandstetter