I'm trying to order the output order of some distinct aggregated text based on the value of another column with something like:
string_agg(DISTINCT sometext, ' ' ORDER BY numval)
However, that results in the error:
ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list
I do understand why this is, since the ordering would be "ill-defined" if the numval
of two repeated values differs, with that of another lying in-between.
Ideally, I would like to order them by first appearance / lowest order-by value, but the ill-defined cases are actually rare enough in my data (it's mostly sequentially repeated values that I want to get rid of with the DISTINCT
) that I ultimately don't particularly care about their ordering and would be happy with something like MySQL's GROUP_CONCAT(DISTINCT sometext ORDER BY numval SEPARATOR ' ')
that simply works despite its sloppiness.
I expect some Postgres contortionism will be necessary, but I don't really know what the most efficient/concise way of going about this would be.
Eliminate the need to do a distinct by pre aggregating
select string_agg(sometext, ' ' order by numval)
from (
select sometext, min(numval) as numval
from t
group by sometext
) s
@Gordon's answer brought a good point. That is if there are other needed columns. In this case a distinct on
is recommended
select x, string_agg(sometext, ' ' order by numval)
from (
select distinct on (sometext) *
from t
order by sometext, numval
) s
group by x
DISTINCT ON
SELECT string_agg(sometext, ' ' ORDER BY numval) AS no_dupe
FROM (
SELECT DISTINCT ON (1,2) <whatever>, sometext, numval
FROM tbl
ORDER BY 1,2,3
) sub;
This is the simpler equivalent of @Gordon's query.
From your description alone I would have suggested @Clodoaldo's simpler variant.
uniq()
for integerFor integer
values instead of text
, the additional module intarray
has just the thing for you:
uniq(int[]) int[] remove adjacent duplicates
Install it once per database with:
CREATE EXTENSION intarray;
Then the query is simply:
SELECT uniq(array_agg(some_int ORDER BY <whatever>, numval)) AS no_dupe
FROM tbl;
Result is an array, wrap it in array_to_string()
if you need a string.
Related:
In fact, it wouldn't be hard to create a custom aggregate function to do the same with text
...
Function that only adds next element to array if it is different from the previous. (NULL
values are removed!):
CREATE OR REPLACE FUNCTION f_array_append_uniq (anyarray, anyelement)
RETURNS anyarray
LANGUAGE sql STRICT IMMUTABLE AS
'SELECT CASE WHEN $1[array_upper($1, 1)] = $2 THEN $1 ELSE $1 || $2 END';
Using polymorphic types to make it work for any scalar data-type. Custom aggregate function:
CREATE AGGREGATE array_agg_uniq(anyelement) (
SFUNC = f_array_append_uniq
, STYPE = anyarray
, INITCOND = '{}'
);
Call:
SELECT array_to_string(
array_agg_uniq(sometext ORDER BY <whatever>, numval)
, ' ') AS no_dupe
FROM tbl;
Note that the aggregate is PARALLEL UNSAFE
(default) by nature, even though the transition function could be marked PARALLEL SAFE
.
Related answer:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With