Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

jsonb[] vs jsonb where json is an array

I have a Postgres table, mytable where one of the field is as follows:

myField JSONB[] NOT NULL

and let's assume the said jsons are of this form:

{ "letter":"A", "digit":30}

What queries should I use to:

  • extract an array of the digit values?
  • extract a json array containing the digit values?
  • extract an array of the digit values where digit > 20?
  • extract a json array of the digit values where digit > 20?

How would the above queries change if I stored the data as json where the json was a list?

  • Can I still make all the above queries?
  • What would be the performance difference?
  • When should I choose one over the other?
like image 404
Maths noob Avatar asked May 13 '26 15:05

Maths noob


1 Answers

Let's create a table that has both a column of type jsonb[] called pg_array that will store an array JSON objects and a column of type jsonb called json_array that will store a JSON array of objects:

CREATE TABLE mytable (id int, pg_array jsonb[], json_array jsonb);
INSERT INTO mytable VALUES
    (1, ARRAY['{"letter":"A", "digit":30}', '{"letter":"B", "digit":31}']::jsonb[], '[{"letter":"A", "digit":30},{"letter":"B", "digit":31}]'),
    (2, ARRAY['{"letter":"X", "digit":40}', '{"letter":"Y", "digit":41}']::jsonb[], '[{"letter":"X", "digit":40},{"letter":"Y", "digit":41}]');

The queries for both approaches will look very similar because we'll be working on the individual array elements, meaning we'll have to unnest and aggreate again.

To unnest pg_array and get each jsonb object:

SELECT unnest(pg_array);

To unnest json_array and get each jsonb object:

SELECT jsonb_array_elements(json_array);

That's the only difference. Thus, the queries below will look almost identical.

On to your first set of questions:

extract an array of the digit values?

db=# SELECT array_agg((x->>'digit')::int) FROM mytable, unnest(pg_array) x GROUP BY id;
 array_agg
-----------
 {40,41}
 {30,31}
(2 rows)
db=# SELECT array_agg((x->>'digit')::int) FROM mytable, jsonb_array_elements(json_array) x GROUP BY id;
 array_agg
-----------
 {40,41}
 {30,31}
(2 rows)

extract a json array containing the digit values?

db=# SELECT jsonb_agg((x->>'digit')::int) FROM mytable, unnest(pg_array) x GROUP BY id;
 jsonb_agg
-----------
 [40, 41]
 [30, 31]
(2 rows)
db=# SELECT jsonb_agg((x->>'digit')::int) FROM mytable, jsonb_array_elements(json_array) x GROUP BY id;
 jsonb_agg
-----------
 [40, 41]
 [30, 31]
(2 rows)

extract an array of the digit values where digit > 20?

(I've used 30 instead of 20 here.)

db=# SELECT array_agg((x->>'digit')::int) FROM mytable, unnest(pg_array) x WHERE (x->>'digit')::int > 30 GROUP BY id;
 array_agg
-----------
 {40,41}
 {31}
(2 rows)
db=# SELECT array_agg((x->>'digit')::int) FROM mytable, jsonb_array_elements(json_array) x WHERE (x->>'digit')::int > 30 GROUP BY id;
 array_agg
-----------
 {40,41}
 {31}
(2 rows)

extract a json array of the digit values where digit > 20?

(I've used 30 instead of 20 here.)

db=# SELECT jsonb_agg((x->>'digit')::int) FROM mytable, unnest(pg_array) x WHERE (x->>'digit')::int > 30 GROUP BY id;
 jsonb_agg
-----------
 [40, 41]
 [31]
(2 rows)
db=# SELECT jsonb_agg((x->>'digit')::int) FROM mytable, jsonb_array_elements(json_array) x WHERE (x->>'digit')::int > 30 GROUP BY id;
 jsonb_agg
-----------
 [40, 41]
 [31]
(2 rows)

For your second set of questions:

Can I still make all the above queries?

As seen above, yes.

What would be the performance difference?

That boils down to the performance difference of unnest and jsonb_array_elements. Let's compare that with a single row that contains an array with 1,000,000 JSON objects:

TRUNCATE mytable;
INSERT INTO mytable
SELECT 1, array_agg(o), jsonb_agg(o)
FROM (SELECT jsonb_build_object('letter', 'A', 'digit', i) o FROM generate_series(1, 1000000) i) x;
phil=# EXPLAIN ANALYZE SELECT unnest(pg_array) FROM mytable;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..35.88 rows=5000 width=32) (actual time=33.357..120.393 rows=1000000 loops=1)
   ->  Seq Scan on mytable  (cost=0.00..10.50 rows=50 width=626) (actual time=0.010..0.013 rows=1 loops=1)
 Planning time: 0.050 ms
 Execution time: 175.670 ms
(4 rows)

phil=# EXPLAIN ANALYZE SELECT jsonb_array_elements(json_array) FROM mytable;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..35.88 rows=5000 width=32) (actual time=257.313..399.883 rows=1000000 loops=1)
   ->  Seq Scan on mytable  (cost=0.00..10.50 rows=50 width=721) (actual time=0.010..0.014 rows=1 loops=1)
 Planning time: 0.047 ms
 Execution time: 455.275 ms
(4 rows)

From this it looks like unnest is around 2.5 times faster than jsonb_array_elements.

When should I choose one over the other?

I assume that your dataset isn't big enough for the difference in performance between unnest and jsonb_array_elements to play a role. Thus, I'd just choose what makes more sense in terms of the data. I'd tend to go with jsonb[] as it more clearly communicates that you'll have an array of json objects.

like image 156
fphilipe Avatar answered May 16 '26 04:05

fphilipe