5

I'm joining several tables together in a postgres database, and returning the values in the right joined table as an aggregated JSON structure in the left joined table. However I find that that query becomes more complicated the more tables that are joined. For example:

select row_to_json(output)
from (
    select image_type.name,
    (
        select json_agg(instances)
        from (
            select image_instance.name, (
                select json_agg(versions)
                from (
                    select image_version.name
                    from image_version
                    where image_version.image_instance_id = image_version.image_instance_id
                ) versions
            ) AS versions
            from image_instance
            where image_instance.image_type_id = image_type.image_type_id
        ) instances
    ) AS images
    from image_type
) output;

I've joined three tables here, however I'd like to add several more tables to this, but the code will quickly become unwieldy and hard to maintain. Is there a simple way to generate these kinds of aggregated joins?

1
  • @a_horse_with_no_name could you expand on this? I read the article you referenced, however my data is stored in normalised tables rather than JSON blobs. I only wish to generated a JSON body from these tables. Perhaps if you could expand a little more it would help me understand. Commented Oct 19, 2016 at 19:02

1 Answer 1

3

First of all, JSON is no different than regular fields when combining data from multiple tables: things can get complex quite quickly. There are, however, a few techniques to keep things manageable:

1. Daisy chain functions

There is no need to treat the output from each function independently, you can feed the output from one function as input to the next in a single statement. In your example this means that you lose a level of sub-select for each level of aggregation and you can forget about the aliases. Your example becomes:

select row_to_json(row(image_type.name, (
       select json_agg(image_instance.name, (
              select json_agg(image_version.name)
              from image_version
              where image_version.image_instance_id = image_instance.id) -- join edited
       from image_instance
       where image_instance.image_type_id = image_type.image_type_id))))
from image_type;

2. Don't use scalar sub-queries

This may be a matter of personal taste, but scalar sub-queries tend to be difficult to read (and write: you had an obvious error in the join condition of your innermost scalar sub-query, just to illustrate my point). Use regular sub-queries with explicit joins and aggregations instead:

select row_to_json(row(it.name, iiv.name))
from image_type it
join (select image_type_id, json_agg(name, iv_name) as name
      from image_instance ii
      join (select image_instance_id, json_agg(name) as iv_name
            from image_version group by 1) iv on iv.image_instance_id = ii.id
      group by 1) iiv using (image_type_id);

3. Modularize

Right there at the beginning of the documentation, in the Tutorial section (highly recommended reading, however proficient you think you are):

Making liberal use of views is a key aspect of good SQL database design.

create view iv_json as
    select image_instance_id, json_agg(name) as iv_name
    from image_version
    group by 1;

create view ii_json as
    select image_type_id, json_agg(name, iv_name) as name
    from image_instance
    join iv_json on image_instance_id = image_instance.id
    group by 1;

Your main query now becomes:

select row_to_json(row(it.name, ii.name))
from image_type it
join ii_json ii using (image_type_id);

And so on...

This is obviously by far the easiest to code, test and maintain. Performance is a non-issue here: the query optimizer will flatten all the linked views into a single execution plan.

Final note: If you are using PG9.4+, you can use json_build_object() instead of row_to_json() for more intelligible output.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your answer @Patrick, this is very comprehensive and has taught me how to use JSON views. I've tried doing this the way you outlined, however they don't appear to be nested but instead grouped in each column. Could you illustrate a way to nest each join as field containing the aggregate?
Without seeing table structures and desired output format it is hard to say what the problem is, but you probably want to change the ii_json view and put json_agg(json_build_object('instance', name, 'versions', iv_name)) instead of the plain json_agg(name, iv_name). Like that each row will have a JSON object with an image type and an array of instances, each composed of a name and an array of versions. Is that what you are after?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.