Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DISTINCT ON in an aggregate function in postgres

For my problem, we have a schema whereby one photo has many tags and also many comments. So if I have a query where I want all the comments and tags, it will multiply the rows together. So if one photo has 2 tags and 13 comments, I get 26 rows for that one photo:

SELECT         tag.name,          comment.comment_id FROM         photo         LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id         LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id         LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id 

enter image description here

That's fine for most things, but it means that if I GROUP BY and then json_agg(tag.*), I get 13 copies of the first tag, and 13 copies of the second tag.

SELECT json_agg(tag.name) as tags FROM         photo         LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id         LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id         LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id GROUP BY photo.photo_id 

enter image description here

Instead I want an array that is only 'suburban' and 'city', like this:

 [       {"tag_id":1,"name":"suburban"},        {"tag_id":2,"name":"city"}  ] 

I could json_agg(DISTINCT tag.name), but this will only make an array of tag names, when I want the entire row as json. I would like to json_agg(DISTINCT ON(tag.name) tag.*), but that's not valid SQL apparently.

How then can I simulate DISTINCT ON inside an aggregate function in Postgres?

like image 602
Migwell Avatar asked May 06 '15 13:05

Migwell


People also ask

Can we use distinct in aggregate function?

You can use DISTINCT to eliminate duplicate values in aggregate function calculations. By default, aggregate functions operate on all the values supplied. A common function used to count the number of rows in the group if no column name is specified.

What is distinct on in PostgreSQL?

PostgreSQL also provides on an expression as DISTINCT ON that is used with the SELECT statement to remove duplicates from a query set result just like the DISTINCT clause.In addition to that it also keeps the “first row” of each row of duplicates in the query set result.

How do you use distinct aggregate functions?

With SUM(), AVG(), and COUNT(expr), DISTINCT eliminates duplicate values before the sum, average, or count is calculated. DISTINCT isn't meaningful with MIN() and MAX(); you can use it, but it won't change the result. You can't use DISTINCT with COUNT(*).

How do I get unique records in PostgreSQL?

Removing duplicate rows from a query result set in PostgreSQL can be done using the SELECT statement with the DISTINCT clause. It keeps one row for each group of duplicates. The DISTINCT clause can be used for a single column or for a list of columns.


2 Answers

Whenever you have a central table and want to left-join it to many rows in table A and also left-join it to many rows in table B, you get these problems of duplicating rows. It can especially throw off aggregrate functions like COUNT and SUM if you're not careful! So I think you need to construct your tags-for-each-photo and comments-for-each-photo separately, and then join them together:

WITH tags AS (   SELECT  photo.photo_id, json_agg(row_to_json(tag.*)) AS tags   FROM    photo   LEFT OUTER JOIN photo_tag on photo_tag.photo_id = photo.photo_id   LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id   GROUP BY photo.photo_id ), comments AS (   SELECT  photo.photo_id, json_agg(row_to_json(comment.*)) AS comments   FROM    photo   LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id   GROUP BY photo.photo_id ) SELECT  COALESCE(tags.photo_id, comments.photo_id) AS photo_id,         tags.tags,         comments.comments FROM    tags FULL OUTER JOIN comments ON      tags.photo_id = comments.photo_id 

EDIT: If you really want to join everything together without CTEs, this looks like it gives correct results:

SELECT  photo.photo_id,         to_json(array_agg(DISTINCT tag.*)) AS tags,         to_json(array_agg(DISTINCT comment.*)) AS comments FROM    photo LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id LEFT OUTER JOIN photo_tag on photo_tag.photo_id = photo.photo_id LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id GROUP BY photo.photo_id 
like image 77
Paul A Jungwirth Avatar answered Oct 02 '22 04:10

Paul A Jungwirth


The most simple thing I discovered is to use DISTINCT over jsonb (not json!). (jsonb_build_object creates jsonb objects)

SELECT     JSON_AGG(        DISTINCT jsonb_build_object('tag_id', photo_tag.tag_id,                                    'name', tag.name)) AS tags FROM photo     LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id     LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id     LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id GROUP BY photo.photo_id 
like image 34
Eugene Kovalev Avatar answered Oct 02 '22 03:10

Eugene Kovalev