Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are Postgres lookups on jsonb columns so slow?

I have a table targeting that has a column marital_status of type text[] and another column data of type jsonb. The content of these two columns is the same, just in a different format (it's just for demonstration purposes). Example data:

 id |      marital_status      |                        data                       
----+--------------------------+---------------------------------------------------
  1 | null                     | {}
  2 | {widowed}                | {"marital_status": ["widowed"]}
  3 | {never_married,divorced} | {"marital_status": ["never_married", "divorced"]}
...

There are more than 690K records in the table in random combination.

Lookup on text[] column

EXPLAIN ANALYZE SELECT marital_status
FROM targeting
WHERE marital_status @> '{widowed}'::text[]

Without index

Usually takes < 900ms without creating any indices:

Seq Scan on targeting  (cost=0.00..172981.38 rows=159061 width=28) (actual time=0.017..840.084 rows=158877 loops=1)
  Filter: (marital_status @> '{widowed}'::text[])
  Rows Removed by Filter: 452033
Planning time: 0.150 ms
Execution time: 845.731 ms

With index

With index it usually takes < 200ms (75% improvement):

CREATE INDEX targeting_marital_status_idx ON targeting ("marital_status");

Result:

Index Only Scan using targeting_marital_status_idx on targeting  (cost=0.42..23931.35 rows=159061 width=28) (actual time=3.528..143.848 rows=158877 loops=1)"
  Filter: (marital_status @> '{widowed}'::text[])
  Rows Removed by Filter: 452033
  Heap Fetches: 0
Planning time: 0.217 ms
Execution time: 148.506 ms

Lookup on jsonb column

EXPLAIN ANALYZE SELECT data
FROM targeting
WHERE (data -> 'marital_status') @> '["widowed"]'::jsonb

Without index

Usually takes < 5,700ms without creating any indices (more than 6 times slower!):

Seq Scan on targeting  (cost=0.00..174508.65 rows=611 width=403) (actual time=0.095..5399.112 rows=158877 loops=1)
  Filter: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
  Rows Removed by Filter: 452033
Planning time: 0.172 ms
Execution time: 5408.326 ms

With index

With index it usually takers < 3,700ms (35% improvement):

CREATE INDEX targeting_data_marital_status_idx ON targeting USING GIN ((data->'marital_status'));

Result:

Bitmap Heap Scan on targeting  (cost=144.73..2482.75 rows=611 width=403) (actual time=85.966..3694.834 rows=158877 loops=1)
  Recheck Cond: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
  Rows Removed by Index Recheck: 201080
  Heap Blocks: exact=33723 lossy=53028
  ->  Bitmap Index Scan on targeting_data_marital_status_idx  (cost=0.00..144.58 rows=611 width=0) (actual time=78.851..78.851 rows=158877 loops=1)"
        Index Cond: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
Planning time: 0.257 ms
Execution time: 3703.492 ms

Questions

  • Why is the text[] column so much more performant, even when not using indices?
  • Why does adding indices to the jsonb column only increase the performance by 35%?
  • Is there a more perfomant way of doing the lookup on the jsonb column?
like image 883
satoshi Avatar asked Oct 29 '22 15:10

satoshi


1 Answers

Seems like an easy question. Essentially you're asking how come,

CREATE TABLE foo ( id int, key1 text );

Is faster than

CREATE TABLE bar ( id int, jsonb foo );

@Craig answers it here in the comment

GIN indexing is generally less efficient than a b-tree, so that much is expected.

Also the null value in that schema should read

SELECT jsonb_build_object('marital_status',ARRAY[null]);
     jsonb_build_object     
----------------------------
 {"marital_status": [null]}
(1 row)

And not {}. PostgreSQL takes numerous shortcuts to make updates to the jsonb objects quick, and to make indexing space-efficient.

If none of this make sense look at this pseudo-table.

CREATE TABLE foo ( id int, x text, y text, z text )
CREATE INDEX ON foo(x);
CREATE INDEX ON foo(y);
CREATE INDEX ON foo(z);

Here we have three btree indexes on the table. Let's look at a similar table..

CREATE TABLE bar ( id int, junk jsonb );
CREATE INDEX ON bar USING gin (junk);
INSERT INTO bar (id,junk) VALUES (1,$${"x": 10, "y": 42}$$);

For bar to be perform like foo, we'd need two btrees both of which would be individually larger than the single GIN index we have. And if you did

INSERT INTO bar (id,junk) VALUES (1,$${"x": 10, "y": 42, "z":3}$$);

We'd have to have another btree index on z, which again would be massive. You can see where I am going with this. jsonb is great, but the complexity of indexing and schema modeling doesn't parallel the database. You can't just reduce the database to a jsonb column, issue CREATE INDEX and expect the same performance.

like image 61
NO WAR WITH RUSSIA Avatar answered Nov 15 '22 06:11

NO WAR WITH RUSSIA