I have an rowset with two columns: technical_id
and natural_id
. The rowset is actually result of complex query. The mapping between columns values is assumed to be bijective (i.e. for two rows with same technical_id
the natural_id
s are same too, for distinct technical_id
s the natural_id
s are distinct too). The (technical_id,natural_id)
pairs are not unique in rowset because of joins in original query. Example:
with t (technical_id, natural_id, val) as (values
(1, 'a', 1),
(1, 'a', 2),
(2, 'b', 3),
(2, 'b', 2),
(3, 'c', 0),
(3, 'c', 1),
(4, 'd', 1)
)
Unfortunately, the bijection is enforced only by application logic. The natural_id
is actually collected from multiple tables and composed using coalesce
-based expression so its uniqueness hardly can be enforced by db constraint.
I need to aggregate rows of rowset by technical_id
assuming the natural_id
is unique. If it isn't (for example if tuple (4, 'x', 1)
were added into sample data), the query should fail. In ideal SQL world I would use some hypothetical aggregate function:
select technical_id, only(natural_id), sum(val)
from t
group by technical_id;
I know there is not such function in SQL. Is there some alternative or workaround? Postgres-specific solutions are also ok.
Note that group by technical_id, natural_id
or select technical_id, max(natural_id)
- though working well in happy case - are both unacceptable (first because the technical_id
must be unique in result under all circumstances, second because the value is potentially random and masks data inconsistency).
Thanks for tips :-)
UPDATE: the expected answer is
technical_id,v,sum
1,a,3
2,b,5
3,c,1
4,d,1
or fail when 4,x,1
is also present.
SQL provides several aggregate functions for making calculations based on columns: AVG , MAX / MIN , SUM , and COUNT . You can use those by typing the function names, then putting the column name or * inside parentheses.
If you are using an aggregate function in a where clause then it means you want to filter data on the basis of that aggregation function.
You can get only the "unique" natural ids using:
select technical_id, max(natural_id), sum(val)
from t
group by technical_id
having min(natural_id) = max(natural_id);
If you want the query to actually fail, that is a little hard to guarantee. Here is a hacky way to do it:
select technical_id, max(natural_id), sum(val)
from t
group by technical_id
having (case when min(natural_id) = max(natural_id) then 0 else 1 / (count(*) - count(*)) end) = 0;
And a db<>fiddle illustrating this.
Seems I've finally found solution based on single-row cardinality of correlated subquery in select clause:
select technical_id,
(select v from unnest(array_agg(distinct natural_id)) as u(v)) as natural_id,
sum(val)
from t
group by technical_id;
This is the simplest solution for my situation at this moment so I'll resort to self-accept. Anyway if some disadvantages show, I will describe them here and reaccept to other answer. I appreciate all other proposals and believe they will be valuable for anybody too.
You can use
SELECT technical_id, max(natural_id), count(natural_id)
...
GROUP BY technical_id;
and throw an error whenever the count is not 1.
If you want to guarantee the constraint with the database, you could do one of these:
Do away with the artificial primary key.
Do something complicated like this:
CREATE TABLE id_map (
technical_id bigint UNIQUE NOT NULL,
natural_id text UNIQUE NOT NULL,
PRIMARY KEY (technical_id, natural_id)
);
ALTER TABLE t
ADD FOREIGN KEY (technical_id, natural_id) REFERENCES id_map;
You can create your own aggregates. ONLY is a keyword, so best not use it as the name of an aggregate. Not willing to put much time into deciding, I called it only2.
CREATE OR REPLACE FUNCTION public.only_agg(anyelement, anyelement)
RETURNS anyelement
LANGUAGE plpgsql
IMMUTABLE
AS $function$
BEGIN
if $1 is null then return $2; end if;
if $2 is null then return $1; end if;
if $1=$2 then return $1; end if;
raise exception 'not only';
END $function$;
create aggregate only2 (anyelement) ( sfunc = only_agg, stype = anyelement);
It might not do the thing you want with NULL inputs, but I don't know what you want in that case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With