I have an rowset with two columns: <code>technical_id</code> and <code>natural_id</code>. The rowset is actually result of complex query. The mapping between columns values is assumed to be bijective (i.e. for two rows with same <code>technical_id</code> the <code>natural_id</code>s are same too, for distinct <code>technical_id</code>s the <code>natural_id</code>s are distinct too). The <code>(technical_id,natural_id)</code> pairs are not unique in rowset because of joins in original query. Example: <pre class="prettyprint"><code>with t (technical_id, natural_id, val) as (values (1, 'a', 1), (1, 'a', 2), (2, 'b', 3), (2, 'b', 2), (3, 'c', 0), (3, 'c', 1), (4, 'd', 1) ) </code></pre> Unfortunately, the bijection is enforced only by application logic. The <code>natural_id</code> is actually collected from multiple tables and composed using <code>coalesce</code>-based expression so its uniqueness hardly can be enforced by db constraint. I need to aggregate rows of rowset by <code>technical_id</code> assuming the <code>natural_id</code> is unique. If it isn't (for example if tuple <code>(4, 'x', 1)</code> were added into sample data), the query should fail. In ideal SQL world I would use some hypothetical aggregate function: <pre class="prettyprint"><code>select technical_id, only(natural_id), sum(val) from t group by technical_id; </code></pre> I know there is not such function in SQL. Is there some alternative or workaround? Postgres-specific solutions are also ok. Note that <code>group by technical_id, natural_id</code> or <code>select technical_id, max(natural_id)</code> - though working well in happy case - are both unacceptable (first because the <code>technical_id</code> must be unique in result under all circumstances, second because the value is potentially random and masks data inconsistency). Thanks for tips :-) UPDATE: the expected answer is <pre class="prettyprint"><code>technical_id,v,sum 1,a,3 2,b,5 3,c,1 4,d,1 </code></pre> or fail when <code>4,x,1</code> is also present.

Seems I've finally found solution based on single-row cardinality of correlated subquery in select clause: <pre class="prettyprint"><code>select technical_id, (select v from unnest(array_agg(distinct natural_id)) as u(v)) as natural_id, sum(val) from t group by technical_id; </code></pre> This is the simplest solution for my situation at this moment so I'll resort to self-accept. Anyway if some disadvantages show, I will describe them here and reaccept to other answer. I appreciate all other proposals and believe they will be valuable for anybody too.

You can use <pre class="prettyprint"><code>SELECT technical_id, max(natural_id), count(natural_id) ... GROUP BY technical_id; </code></pre> and throw an error whenever the count is not 1. If you want to guarantee the constraint with the database, you could do one of these: <ol> <li>Do away with the artificial primary key.</li> <li> Do something complicated like this: <pre class="prettyprint"><code>CREATE TABLE id_map ( technical_id bigint UNIQUE NOT NULL, natural_id text UNIQUE NOT NULL, PRIMARY KEY (technical_id, natural_id) ); ALTER TABLE t ADD FOREIGN KEY (technical_id, natural_id) REFERENCES id_map; </code></pre> </li> </ol>

SQL aggregation function to choose the only value

Tags:

sql

unique

postgresql

aggregate-functions

aggregate

I have an rowset with two columns: technical_id and natural_id. The rowset is actually result of complex query. The mapping between columns values is assumed to be bijective (i.e. for two rows with same technical_id the natural_ids are same too, for distinct technical_ids the natural_ids are distinct too). The (technical_id,natural_id) pairs are not unique in rowset because of joins in original query. Example:

with t (technical_id, natural_id, val) as (values
  (1, 'a', 1),
  (1, 'a', 2),
  (2, 'b', 3),
  (2, 'b', 2),
  (3, 'c', 0),
  (3, 'c', 1),
  (4, 'd', 1)
)

Unfortunately, the bijection is enforced only by application logic. The natural_id is actually collected from multiple tables and composed using coalesce-based expression so its uniqueness hardly can be enforced by db constraint.

I need to aggregate rows of rowset by technical_id assuming the natural_id is unique. If it isn't (for example if tuple (4, 'x', 1) were added into sample data), the query should fail. In ideal SQL world I would use some hypothetical aggregate function:

select technical_id, only(natural_id), sum(val)
from t
group by technical_id;

I know there is not such function in SQL. Is there some alternative or workaround? Postgres-specific solutions are also ok.

Note that group by technical_id, natural_id or select technical_id, max(natural_id) - though working well in happy case - are both unacceptable (first because the technical_id must be unique in result under all circumstances, second because the value is potentially random and masks data inconsistency).

Thanks for tips :-)

UPDATE: the expected answer is

technical_id,v,sum
1,a,3
2,b,5
3,c,1
4,d,1

or fail when 4,x,1 is also present.

853

asked Jan 23 '20 15:01

Tomáš Záluský

4 Answers

You can get only the "unique" natural ids using:

select technical_id, max(natural_id), sum(val)
from t
group by technical_id
having min(natural_id) = max(natural_id);

If you want the query to actually fail, that is a little hard to guarantee. Here is a hacky way to do it:

select technical_id, max(natural_id), sum(val)
from t
group by technical_id
having (case when min(natural_id) = max(natural_id) then 0 else 1 / (count(*) - count(*)) end) = 0;

And a db<>fiddle illustrating this.

172

answered Nov 02 '22 22:11

Gordon Linoff

Seems I've finally found solution based on single-row cardinality of correlated subquery in select clause:

select technical_id,
       (select v from unnest(array_agg(distinct natural_id)) as u(v)) as natural_id,
       sum(val)
from t
group by technical_id;

This is the simplest solution for my situation at this moment so I'll resort to self-accept. Anyway if some disadvantages show, I will describe them here and reaccept to other answer. I appreciate all other proposals and believe they will be valuable for anybody too.

answered Nov 02 '22 23:11

Tomáš Záluský

You can use

SELECT technical_id, max(natural_id), count(natural_id)
...
GROUP BY technical_id;

and throw an error whenever the count is not 1.

If you want to guarantee the constraint with the database, you could do one of these:

Do away with the artificial primary key.

Do something complicated like this:

CREATE TABLE id_map (
   technical_id bigint UNIQUE NOT NULL,
   natural_id text UNIQUE NOT NULL,
   PRIMARY KEY (technical_id, natural_id)
);

ALTER TABLE t
   ADD FOREIGN KEY (technical_id, natural_id) REFERENCES id_map;

answered Nov 02 '22 23:11

Laurenz Albe

You can create your own aggregates. ONLY is a keyword, so best not use it as the name of an aggregate. Not willing to put much time into deciding, I called it only2.

CREATE OR REPLACE FUNCTION public.only_agg(anyelement, anyelement)
 RETURNS anyelement
 LANGUAGE plpgsql
 IMMUTABLE
AS $function$
BEGIN 
  if $1 is null then return $2; end if; 
  if $2 is null then return $1; end if; 
  if $1=$2 then return $1; end if; 
  raise exception 'not only';  
END $function$;

create aggregate only2 (anyelement) ( sfunc = only_agg, stype = anyelement);

It might not do the thing you want with NULL inputs, but I don't know what you want in that case.

answered Nov 02 '22 22:11

jjanes

Related questions
                            
                                Show correct result with SQL Joins
                            
                                Generate ID based on multiple columns
                            
                                How to insert a row into another table using last inserted ID?
                            
                                How to import a SQLite3 database into Python Jupyter Notebook?
                            
                                Cannot drop a role that is granted to connect database
                            
                                SQL Insert multiple record while using ON DUPLICATE KEY UPDATE
                            
                                SQL SUM on multiple INNER JOIN
                            
                                Newbie question: Problem with results, sql, join, where, "<" operator
                            
                                How to parse XML data in SQL server table
                            
                                How to have auto increment in ClickHouse?
                            
                                Comparing two columns in postgres database
                            
                                Update table using JSON in SQL
                            
                                SQL Server select variable where no results
                            
                                Get identity of row inserted in Snowflake Datawarehouse
                            
                                Remove duplicated subsets from very large table
                            
                                How display result count from query
                            
                                How to get everything before the last occurrence of a character in MySQL?
                            
                                STRING_SPLIT to Multiple Variables
                            
                                Oracle SQL "column ambiguously defined" with `FETCH FIRST n ROWS ONLY`
                            
                                Preventing insertion of duplicates without using indices

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQL aggregation function to choose the only value

Tags:

sql

unique

postgresql

aggregate-functions

aggregate

Tomáš Záluský

People also ask

4 Answers

Gordon Linoff

Tomáš Záluský

Laurenz Albe

jjanes

Recent Activity

Donate For Us