Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

postgresql: how to get primary key from a group by clause?

This is a query which selects a set of desired rows:

select max(a), b, c, d, e
from T
group by b, c, d, e;

The table has a primary key, in column id.

I would like to identify these rows in a further query, by getting the primary key from each of those rows. How would I do that? This does not work:

select id, max(a), b, c, d, e
from T 
group by b, c, d, e;

ERROR:  column "T.id" must appear in the GROUP BY clause or be used in an aggregate function

I have tried this from poking around in some other postgresql questions, but no luck:

select distinct on (id) id, max(a), b, c, d, e
from T 
group by b, c, d, e;

ERROR:  column "T.id" must appear in the GROUP BY clause or be used in an aggregate function

What do I do? I know there can only be one id for each result, cause it's a primary key... I literally want the primary key along with the rest of the data, for each row that the initial (working) query returns.

like image 638
Claudiu Avatar asked Oct 28 '11 21:10

Claudiu


People also ask

Can we use GROUP BY with primary key?

Essentially this means grouping by the primary key of a table results in no change in rows to that table, therefore if we group by the primary key of a table, we can call on all columns of that table with no aggregate function.

How do you use GROUP BY By order by clause?

When combining the Group By and Order By clauses, it is important to bear in mind that, in terms of placement within a SELECT statement: The GROUP BY clause is placed after the WHERE clause. The GROUP BY clause is placed before the ORDER BY clause.

What is GROUP BY clause in PostgreSQL?

The PostgreSQL GROUP BY clause is used to divide rows returned by SELECT statement into different groups. The speciality of GROUP BY clause is that one can use Functions like SUM() to calculate the sum of items or COUNT() to get the total number of items in the groups.

Can GROUP BY be used in subquery?

You can use group by in a subquery, but your syntax is off.


2 Answers

If you don't care which id you get then you just need to wrap your id in some aggregate function that is guaranteed to give you a valid id. The max and min aggregates come to mind:

-- Or min(id) if you want better spiritual balance.
select max(id), max(a), b, c, d, e
from T 
group by b, c, d, e;

Depending on your data I think using a window function would be a better plan (thanks to evil otto for the boot to the head):

select id, a, b, c, d, e
from (
    select id, a, b, c, d, e, rank() over (partition by b,c,d,e order by a desc) as r
    from T
) as dt
where r = 1
like image 61
mu is too short Avatar answered Sep 19 '22 17:09

mu is too short


By virtue of the fact that you are grouping, there can (and will likely) be more than one matched record (eg, more than one id value) per returned record.

PostgreSQL is pretty strict - it will not guess at what you mean.

  1. you could run a subquery
  2. you could run another query based on b,c,d,e
  3. you could use a array_agg grouping function to get an array of id values per record.

See this question: Postgresql GROUP_CONCAT equivalent?

I suggest you consider #3 as the most efficient of the possibilities.

Hope this helps. Thanks!

like image 39
gahooa Avatar answered Sep 19 '22 17:09

gahooa