Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to combine DISTINCT and ORDER BY in array_agg of jsonb values in postgresSQL

Note: I am using the latest version of Postgres (9.4)

I am trying to write a query which does a simple join of 2 tables, and groups by the primary key of the first table, and does an array_agg of several fields in the 2nd table which I want returned as an object. The array needs to be sorted by a combination of 2 fields in the json objects, and also uniquified.

So far, I have come up with the following:

SELECT  
  zoo.id,  
  ARRAY_AGG(
    DISTINCT ROW_TO_JSON(( 
      SELECT x  
      FROM ( 
        SELECT animals.type, animals.name 
      ) x
    ))::JSONB
    -- ORDER BY animals.type, animals.name
  )
  FROM zoo
  JOIN animals ON animals.zooId = zoo.id
  GROUP BY zoo.id;

This results in one row for each zoo, with a an aggregate array of jsonb objects, one for each animal, uniquely.

However, I can't seem to figure out how to also sort this by the parameters in the commented out part of the code.

If I take out the distinct, I can ORDER BY original fields, which works great, but then I have duplicates.

like image 397
Philberg Avatar asked Oct 31 '22 04:10

Philberg


1 Answers

If you use row_to_json() you will lose the column names unless you put in a row that is typed. If you "manually" build the jsonb object with json_build_object() using explicit names then you get them back:

SELECT zoo.id, array_agg(za.jb) AS animals
FROM zoo
JOIN (
  SELECT DISTINCT ON (zooId, "type", "name")
    zooId, json_build_object('animal_type', "type", 'animal_name', "name")::jsonb AS jb
  FROM animals
  ORDER BY zooId, jb->>'animal_type', jb->>'animal_name'
  -- ORDER BY zooId, "type", "name" is far more efficient
) AS za ON za.zooId = zoo.id
GROUP BY zoo.id;

You can ORDER BY the elements of a jsonb object, as shown above, but (as far as I know) you cannot use DISTINCT on a jsonb object. In your case this would be rather inefficient anyway (first building all the jsonb objects, then throwing out duplicates) and at the aggregate level it is plain impossible with standard SQL. You can achieve the same result, however, by applying the DISTINCT clause before building the jsonb object.

Also, avoid use of SQL key words like "type" and standard data types like "name" as column names. Both are non-reserved keywords so you can use them in their proper contexts, but practically speaking your commands could get really confusing. You could, for instance, have a schema, with a table, a column in that table, and a data type each called "type" and then you could get this:

SELECT type::type FROM type.type WHERE type = something;

While PostgreSQL will graciously accept this, it is plain confusing at best and prone to error in all sorts of more complex situations. You can get a long way by double-quoting any key words, but they are best just avoided as identifiers.

like image 128
Patrick Avatar answered Nov 15 '22 07:11

Patrick