For my problem, we have a schema whereby one photo has many tags and also many comments. So if I have a query where I want all the comments and tags, it will multiply the rows together. So if one photo has 2 tags and 13 comments, I get 26 rows for that one photo:
SELECT tag.name, comment.comment_id FROM photo LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
That's fine for most things, but it means that if I GROUP BY
and then json_agg(tag.*)
, I get 13 copies of the first tag, and 13 copies of the second tag.
SELECT json_agg(tag.name) as tags FROM photo LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id GROUP BY photo.photo_id
Instead I want an array that is only 'suburban' and 'city', like this:
[ {"tag_id":1,"name":"suburban"}, {"tag_id":2,"name":"city"} ]
I could json_agg(DISTINCT tag.name)
, but this will only make an array of tag names, when I want the entire row as json. I would like to json_agg(DISTINCT ON(tag.name) tag.*)
, but that's not valid SQL apparently.
How then can I simulate DISTINCT ON
inside an aggregate function in Postgres?
You can use DISTINCT to eliminate duplicate values in aggregate function calculations. By default, aggregate functions operate on all the values supplied. A common function used to count the number of rows in the group if no column name is specified.
PostgreSQL also provides on an expression as DISTINCT ON that is used with the SELECT statement to remove duplicates from a query set result just like the DISTINCT clause.In addition to that it also keeps the “first row” of each row of duplicates in the query set result.
With SUM(), AVG(), and COUNT(expr), DISTINCT eliminates duplicate values before the sum, average, or count is calculated. DISTINCT isn't meaningful with MIN() and MAX(); you can use it, but it won't change the result. You can't use DISTINCT with COUNT(*).
Removing duplicate rows from a query result set in PostgreSQL can be done using the SELECT statement with the DISTINCT clause. It keeps one row for each group of duplicates. The DISTINCT clause can be used for a single column or for a list of columns.
Whenever you have a central table and want to left-join it to many rows in table A and also left-join it to many rows in table B, you get these problems of duplicating rows. It can especially throw off aggregrate functions like COUNT
and SUM
if you're not careful! So I think you need to construct your tags-for-each-photo and comments-for-each-photo separately, and then join them together:
WITH tags AS ( SELECT photo.photo_id, json_agg(row_to_json(tag.*)) AS tags FROM photo LEFT OUTER JOIN photo_tag on photo_tag.photo_id = photo.photo_id LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id GROUP BY photo.photo_id ), comments AS ( SELECT photo.photo_id, json_agg(row_to_json(comment.*)) AS comments FROM photo LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id GROUP BY photo.photo_id ) SELECT COALESCE(tags.photo_id, comments.photo_id) AS photo_id, tags.tags, comments.comments FROM tags FULL OUTER JOIN comments ON tags.photo_id = comments.photo_id
EDIT: If you really want to join everything together without CTEs, this looks like it gives correct results:
SELECT photo.photo_id, to_json(array_agg(DISTINCT tag.*)) AS tags, to_json(array_agg(DISTINCT comment.*)) AS comments FROM photo LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id LEFT OUTER JOIN photo_tag on photo_tag.photo_id = photo.photo_id LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id GROUP BY photo.photo_id
The most simple thing I discovered is to use DISTINCT
over jsonb
(not json!). (jsonb_build_object
creates jsonb objects)
SELECT JSON_AGG( DISTINCT jsonb_build_object('tag_id', photo_tag.tag_id, 'name', tag.name)) AS tags FROM photo LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id GROUP BY photo.photo_id
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With