Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL aggregation function to choose the only value

I have an rowset with two columns: technical_id and natural_id. The rowset is actually result of complex query. The mapping between columns values is assumed to be bijective (i.e. for two rows with same technical_id the natural_ids are same too, for distinct technical_ids the natural_ids are distinct too). The (technical_id,natural_id) pairs are not unique in rowset because of joins in original query. Example:

with t (technical_id, natural_id, val) as (values
  (1, 'a', 1),
  (1, 'a', 2),
  (2, 'b', 3),
  (2, 'b', 2),
  (3, 'c', 0),
  (3, 'c', 1),
  (4, 'd', 1)
)

Unfortunately, the bijection is enforced only by application logic. The natural_id is actually collected from multiple tables and composed using coalesce-based expression so its uniqueness hardly can be enforced by db constraint.

I need to aggregate rows of rowset by technical_id assuming the natural_id is unique. If it isn't (for example if tuple (4, 'x', 1) were added into sample data), the query should fail. In ideal SQL world I would use some hypothetical aggregate function:

select technical_id, only(natural_id), sum(val)
from t
group by technical_id;

I know there is not such function in SQL. Is there some alternative or workaround? Postgres-specific solutions are also ok.

Note that group by technical_id, natural_id or select technical_id, max(natural_id) - though working well in happy case - are both unacceptable (first because the technical_id must be unique in result under all circumstances, second because the value is potentially random and masks data inconsistency).

Thanks for tips :-)

UPDATE: the expected answer is

technical_id,v,sum
1,a,3
2,b,5
3,c,1
4,d,1

or fail when 4,x,1 is also present.

like image 853
Tomáš Záluský Avatar asked Jan 23 '20 15:01

Tomáš Záluský


People also ask

Can we use SELECT with aggregate function?

SQL provides several aggregate functions for making calculations based on columns: AVG , MAX / MIN , SUM , and COUNT . You can use those by typing the function names, then putting the column name or * inside parentheses.

Can we use aggregate function to filter values in WHERE clause?

If you are using an aggregate function in a where clause then it means you want to filter data on the basis of that aggregation function.


4 Answers

You can get only the "unique" natural ids using:

select technical_id, max(natural_id), sum(val)
from t
group by technical_id
having min(natural_id) = max(natural_id);

If you want the query to actually fail, that is a little hard to guarantee. Here is a hacky way to do it:

select technical_id, max(natural_id), sum(val)
from t
group by technical_id
having (case when min(natural_id) = max(natural_id) then 0 else 1 / (count(*) - count(*)) end) = 0;

And a db<>fiddle illustrating this.

like image 172
Gordon Linoff Avatar answered Nov 02 '22 22:11

Gordon Linoff


Seems I've finally found solution based on single-row cardinality of correlated subquery in select clause:

select technical_id,
       (select v from unnest(array_agg(distinct natural_id)) as u(v)) as natural_id,
       sum(val)
from t
group by technical_id;

This is the simplest solution for my situation at this moment so I'll resort to self-accept. Anyway if some disadvantages show, I will describe them here and reaccept to other answer. I appreciate all other proposals and believe they will be valuable for anybody too.

like image 20
Tomáš Záluský Avatar answered Nov 02 '22 23:11

Tomáš Záluský


You can use

SELECT technical_id, max(natural_id), count(natural_id)
...
GROUP BY technical_id;

and throw an error whenever the count is not 1.

If you want to guarantee the constraint with the database, you could do one of these:

  1. Do away with the artificial primary key.

  2. Do something complicated like this:

    CREATE TABLE id_map (
       technical_id bigint UNIQUE NOT NULL,
       natural_id text UNIQUE NOT NULL,
       PRIMARY KEY (technical_id, natural_id)
    );
    
    ALTER TABLE t
       ADD FOREIGN KEY (technical_id, natural_id) REFERENCES id_map;
    
like image 22
Laurenz Albe Avatar answered Nov 02 '22 23:11

Laurenz Albe


You can create your own aggregates. ONLY is a keyword, so best not use it as the name of an aggregate. Not willing to put much time into deciding, I called it only2.

CREATE OR REPLACE FUNCTION public.only_agg(anyelement, anyelement)
 RETURNS anyelement
 LANGUAGE plpgsql
 IMMUTABLE
AS $function$
BEGIN 
  if $1 is null then return $2; end if; 
  if $2 is null then return $1; end if; 
  if $1=$2 then return $1; end if; 
  raise exception 'not only';  
END $function$;

create aggregate only2 (anyelement) ( sfunc = only_agg, stype = anyelement);

It might not do the thing you want with NULL inputs, but I don't know what you want in that case.

like image 44
jjanes Avatar answered Nov 02 '22 22:11

jjanes