I have a table targeting
that has a column marital_status
of type text[]
and another column data
of type jsonb
. The content of these two columns is the same, just in a different format (it's just for demonstration purposes). Example data:
id | marital_status | data
----+--------------------------+---------------------------------------------------
1 | null | {}
2 | {widowed} | {"marital_status": ["widowed"]}
3 | {never_married,divorced} | {"marital_status": ["never_married", "divorced"]}
...
There are more than 690K records in the table in random combination.
EXPLAIN ANALYZE SELECT marital_status
FROM targeting
WHERE marital_status @> '{widowed}'::text[]
Usually takes < 900ms without creating any indices:
Seq Scan on targeting (cost=0.00..172981.38 rows=159061 width=28) (actual time=0.017..840.084 rows=158877 loops=1)
Filter: (marital_status @> '{widowed}'::text[])
Rows Removed by Filter: 452033
Planning time: 0.150 ms
Execution time: 845.731 ms
With index it usually takes < 200ms (75% improvement):
CREATE INDEX targeting_marital_status_idx ON targeting ("marital_status");
Result:
Index Only Scan using targeting_marital_status_idx on targeting (cost=0.42..23931.35 rows=159061 width=28) (actual time=3.528..143.848 rows=158877 loops=1)"
Filter: (marital_status @> '{widowed}'::text[])
Rows Removed by Filter: 452033
Heap Fetches: 0
Planning time: 0.217 ms
Execution time: 148.506 ms
EXPLAIN ANALYZE SELECT data
FROM targeting
WHERE (data -> 'marital_status') @> '["widowed"]'::jsonb
Usually takes < 5,700ms without creating any indices (more than 6 times slower!):
Seq Scan on targeting (cost=0.00..174508.65 rows=611 width=403) (actual time=0.095..5399.112 rows=158877 loops=1)
Filter: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
Rows Removed by Filter: 452033
Planning time: 0.172 ms
Execution time: 5408.326 ms
With index it usually takers < 3,700ms (35% improvement):
CREATE INDEX targeting_data_marital_status_idx ON targeting USING GIN ((data->'marital_status'));
Result:
Bitmap Heap Scan on targeting (cost=144.73..2482.75 rows=611 width=403) (actual time=85.966..3694.834 rows=158877 loops=1)
Recheck Cond: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
Rows Removed by Index Recheck: 201080
Heap Blocks: exact=33723 lossy=53028
-> Bitmap Index Scan on targeting_data_marital_status_idx (cost=0.00..144.58 rows=611 width=0) (actual time=78.851..78.851 rows=158877 loops=1)"
Index Cond: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
Planning time: 0.257 ms
Execution time: 3703.492 ms
text[]
column so much more performant, even when not using indices?jsonb
column only increase the performance by 35%?jsonb
column?Seems like an easy question. Essentially you're asking how come,
CREATE TABLE foo ( id int, key1 text );
Is faster than
CREATE TABLE bar ( id int, jsonb foo );
@Craig answers it here in the comment
GIN indexing is generally less efficient than a b-tree, so that much is expected.
Also the null value in that schema should read
SELECT jsonb_build_object('marital_status',ARRAY[null]);
jsonb_build_object
----------------------------
{"marital_status": [null]}
(1 row)
And not {}
. PostgreSQL takes numerous shortcuts to make updates to the jsonb objects quick, and to make indexing space-efficient.
If none of this make sense look at this pseudo-table.
CREATE TABLE foo ( id int, x text, y text, z text )
CREATE INDEX ON foo(x);
CREATE INDEX ON foo(y);
CREATE INDEX ON foo(z);
Here we have three btree indexes on the table. Let's look at a similar table..
CREATE TABLE bar ( id int, junk jsonb );
CREATE INDEX ON bar USING gin (junk);
INSERT INTO bar (id,junk) VALUES (1,$${"x": 10, "y": 42}$$);
For bar
to be perform like foo
, we'd need two btrees both of which would be individually larger than the single GIN index we have. And if you did
INSERT INTO bar (id,junk) VALUES (1,$${"x": 10, "y": 42, "z":3}$$);
We'd have to have another btree index on z
, which again would be massive. You can see where I am going with this. jsonb
is great, but the complexity of indexing and schema modeling doesn't parallel the database. You can't just reduce the database to a jsonb column, issue CREATE INDEX
and expect the same performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With