Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

jsonb query with nested objects in an array

I'm using PostgreSQL 9.4 with a table teams containing a jsonb column named json. I am looking for a query where I can get all teams which have the Players 3, 4 and 7 in their array of players.

The table contains two rows with the following json data:

First row:

{
    "id": 1,
    "name": "foobar",
    "members": {
        "coach": {
            "id": 1,
            "name": "A dude"
        },
        "players": [
            {
                "id": 2,
                "name": "B dude"
            },
            {
                "id": 3,
                "name": "C dude"
            },
            {
                "id": 4,
                "name": "D dude"
            },
            {
                "id": 6,
                "name": "F dude"
            },
            {
                "id": 7,
                "name": "G dude"
            }
        ]
    }
}

second row:

{
    "id": 2,
    "name": "bazbar",
    "members": {
        "coach": {
            "id": 11,
            "name": "A dude"
        },
        "players": [
            {
                "id": 3,
                "name": "C dude"
            },
            {
                "id": 5,
                "name": "E dude"
            },
            {
                "id": 6,
                "name": "F dude"
            },
            {
                "id": 7,
                "name": "G dude"
            },
            {
                "id": 8,
                "name": "H dude"
            }
        ]
    }
}

How does the query have to look like to get the desired list of teams? I've tried a query where I'd create an array from the member players jsonb_array_elements(json -> 'members' -> 'players')->'id' and compare them, but all I was able to accomplish is a result where any of the compared player ids was available in a team, not all of them.

like image 812
Timo Avatar asked Mar 17 '15 19:03

Timo


2 Answers

https://www.postgresql.org/docs/release/14.0/

Subscripting can now be applied to any data type for which it is a useful notation, not only arrays. In this release, the jsonb and hstore types have gained subscripting operators. Let's use subscripting feature in postgresql 14.

with a as(
select data['id'] as teamid,
       (jsonb_array_elements( data['members']['players']))['id'] as playerid
from teams), b as( select teamid, array_agg(playerid) as playerids from a group by 1)
select b.* from b where b.playerids @> '{3,4,7}';

returns:

 teamid |  playerids
--------+-------------
 1      | {2,3,4,6,7}

 

DB fiddle

like image 176
Mark Avatar answered Sep 21 '22 10:09

Mark


You are facing two non-trivial tasks at once. I am intrigued.

  • Process jsonb with a complex nested structure.
  • Run the equivalent of a relational division query on the document type.

First, register a row type for jsonb_populate_recordset(). You can either create a type permanently with CREATE TYPE, or create a temp table for ad-hoc use (dropped automatically at the end of the session):

CREATE TEMP TABLE foo(id int);  -- just "id", we don't need "name"

We only need the id, so don't include the name. Per documentation:

JSON fields that do not appear in the target row type will be omitted from the output

Query

SELECT t.json->>'id' AS team_id, p.players
FROM   teams t
     , LATERAL (SELECT ARRAY (
         SELECT * FROM jsonb_populate_recordset(null::foo, t.json#>'{members,players}')
         )
       ) AS p(players)
WHERE p.players @> '{3,4,7}';

SQL Fiddle for json in Postgres 9.3 (pg 9.4 not available yet).

Explain

  • Extracts the JSON array with player records:

    t.json#>'{members,players}'
    
  • From these, I unnest rows with just the id with:

    jsonb_populate_recordset(null::foo, t.json#>'{members,players}')
    

    ... and immediately aggregate those into a Postgres array, so we keep one row per row in the base table:

    SELECT ARRAY ( ... )
    
  • All of this happens in a lateral join:

    , LATERAL (SELECT ... ) AS p(players)
    
  • Immediately filter the resulting arrays to keep only the ones we are looking for - with the "contains" array operator @>:

    WHERE p.players @> '{3,4,7}'
    

Voilá.

If you run this query a lot on a big table, you could create a fake IMMUTABLE function that extracts the array like above and create functional GIN index based on this function to make this super fast.
"Fake" because the function depends on the underlying row type, i.e. on a catalog lookup, and would change if that changes. (So make sure it does not change.) Similar to this one:

  • Index for finding an element in a JSON array

Aside:
Don't use type names like json as column names (even if that's allowed), that invites tricky syntax errors and confusing error messages.

like image 20
Erwin Brandstetter Avatar answered Sep 24 '22 10:09

Erwin Brandstetter