I'm joining several tables together in a postgres database, and returning the values in the right joined table as an aggregated JSON structure in the left joined table. However I find that that query becomes more complicated the more tables that are joined. For example:
select row_to_json(output)
from (
select image_type.name,
(
select json_agg(instances)
from (
select image_instance.name, (
select json_agg(versions)
from (
select image_version.name
from image_version
where image_version.image_instance_id = image_version.image_instance_id
) versions
) AS versions
from image_instance
where image_instance.image_type_id = image_type.image_type_id
) instances
) AS images
from image_type
) output;
I've joined three tables here, however I'd like to add several more tables to this, but the code will quickly become unwieldy and hard to maintain. Is there a simple way to generate these kinds of aggregated joins?
First of all, JSON is no different than regular fields when combining data from multiple tables: things can get complex quite quickly. There are, however, a few techniques to keep things manageable:
1. Daisy chain functions
There is no need to treat the output from each function independently, you can feed the output from one function as input to the next in a single statement. In your example this means that you lose a level of sub-select for each level of aggregation and you can forget about the aliases. Your example becomes:
select row_to_json(row(image_type.name, (
select json_agg(image_instance.name, (
select json_agg(image_version.name)
from image_version
where image_version.image_instance_id = image_instance.id) -- join edited
from image_instance
where image_instance.image_type_id = image_type.image_type_id))))
from image_type;
2. Don't use scalar sub-queries
This may be a matter of personal taste, but scalar sub-queries tend to be difficult to read (and write: you had an obvious error in the join condition of your innermost scalar sub-query, just to illustrate my point). Use regular sub-queries with explicit joins and aggregations instead:
select row_to_json(row(it.name, iiv.name))
from image_type it
join (select image_type_id, json_agg(name, iv_name) as name
from image_instance ii
join (select image_instance_id, json_agg(name) as iv_name
from image_version group by 1) iv on iv.image_instance_id = ii.id
group by 1) iiv using (image_type_id);
3. Modularize
Right there at the beginning of the documentation, in the Tutorial section (highly recommended reading, however proficient you think you are):
Making liberal use of views is a key aspect of good SQL database design.
create view iv_json as
select image_instance_id, json_agg(name) as iv_name
from image_version
group by 1;
create view ii_json as
select image_type_id, json_agg(name, iv_name) as name
from image_instance
join iv_json on image_instance_id = image_instance.id
group by 1;
Your main query now becomes:
select row_to_json(row(it.name, ii.name))
from image_type it
join ii_json ii using (image_type_id);
And so on...
This is obviously by far the easiest to code, test and maintain. Performance is a non-issue here: the query optimizer will flatten all the linked views into a single execution plan.
Final note: If you are using PG9.4+, you can use json_build_object()
instead of row_to_json()
for more intelligible output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With