Are postgres JSON indexes efficient enough compared with classic normalized tables?

Tags:

Current Postgresql versions have introduced various features for JSON content, but I'm concerned if I really should use them - I mean, there is not yet "best practice" estabilished on what works and what doesn't, or at least I can't find it.

I have a specific example - I have a table about objects which, among other things, contains a list of alternate names for that object. All that data will also be included in a JSON column for retrieval purposes. For example (skipping all the other irrelevant fields).

create table stuff (id serial primary key, data json); insert into stuff(data) values('{"AltNames":["Name1","Name2","Name3"]}')

I will need some queries in the form "list all objects where one of altnames is 'foobar'." The expected table size is on the order of a few million records. Postgres JSON queries can be used for that, and it can also be indexed (Index for finding an element in a JSON array, for example). However, SHOULD it be done that way or is it a perverse workaround that's not recommended?

The classic alternative, of course, is to add an additional table for that one-to-many relation, containing the name and a foreign key to the main table; the performance of that is well understood. However, that has it's own disadvantages as then it means either data duplication between that table and JSON (with possible integrity risk); or creating that JSON return data dynamically at every request, which has it's own performance penalty.

638

asked Sep 14 '13 08:09

Peteris

2 Answers

I will need some queries in the form "list all objects where one of altnames is 'foobar'." The expected table size is on the order of a few million records. Postgres JSON queries can be used for that, and it can also be indexed (Index For Finding Element in JSON array, for example). However, SHOULD it be done that way or is it a perverse workaround that's not recommended?

It can be done that way but that doesn't mean that you should. In some sense, the best practice is well documented already (see e.g. using hstore vs using XML vs using EAV vs using a separate table) with a new datatype which, for all intents and practical purposes (besides validation and syntax), is no different from prior unstructured or semi-structured options.

Put another way, it's the same old pig with new makeup.

JSON offers the ability to use inverted search tree indexes, in the same way as hstore, array types and tsvectors do. They work fine, but keep in mind that they're primarily designed for extracting points in a neighborhood (think geometry types) ordered by distance, rather than for extracting a list of values in lexicographical order.

To illustrate, take the two plans that Roman's answer outlines:

The one that does an index scan plows through disk pages directly, retrieving the rows in the order indicated by the index.
The one that does a bitmap index scan starts by identifying every disk page that might contain a row, and reads them as they appear on disk, as if it was (and in fact, precisely like) doing a sequence scan that skips useless areas.

Getting back to your question: Cluttered and oversized inverted tree indexes will indeed improve the performance of your app if you use Postgres tables as giant JSON stores. But they're not a silver bullet either, and they won't get you as far as proper relational design when dealing with bottlenecks.

The bottom line, in the end, is no different from what you'd get when deciding to use hstore or an EAV:

If it needs an index (i.e. it frequently appears in a where clause or, even more importantly, in a join clause), you likely want the data in a separate field.
If it's primarily cosmetic, JSON/hstore/EAV/XML/whatever-makes-you-sleep-at-night works fine.

answered Sep 28 '22 16:09

Denis de Bernardy

I'd say it worth a try. I've created some test (100000 records, ~10 elements in JSON array) and checked how it's working:

create table test1 (id serial primary key, data json); create table test1_altnames (id int, name text);  create or replace function array_from_json(_j json) returns text[] as $func$     select array_agg(x.elem::text)     from json_array_elements(_j) as x(elem) $func$ language sql immutable;  with cte as (     select         (random() * 100000)::int as grp, (random() * 1000000)::int as name     from generate_series(1, 1000000) ), cte2 as (     select         array_agg(Name) as "AltNames"     from cte     group by grp ) insert into test1 (data) select row_to_json(t) from cte2 as t  insert into test1_altnames (id, name) select id, json_array_elements(data->'AltNames')::text from test1  create index ix_test1 on test1 using gin(array_from_json(data->'AltNames')); create index ix_test1_altnames on test1_altnames (name);

Query JSON (30ms on my machine):

select * from test1 where '{489147}' <@ array_from_json(data->'AltNames');  "Bitmap Heap Scan on test1  (cost=224.13..1551.41 rows=500 width=36)" "  Recheck Cond: ('{489147}'::text[] <@ array_from_json((data -> 'AltNames'::text)))" "  ->  Bitmap Index Scan on ix_test1  (cost=0.00..224.00 rows=500 width=0)" "        Index Cond: ('{489147}'::text[] <@ array_from_json((data -> 'AltNames'::text)))"

Query table with names (15ms on my machine):

select * from test1 as t where t.id in (select tt.id from test1_altnames as tt where tt.name = '489147');  "Nested Loop  (cost=12.76..20.80 rows=2 width=36)" "  ->  HashAggregate  (cost=12.46..12.47 rows=1 width=4)" "        ->  Index Scan using ix_test1_altnames on test1_altnames tt  (cost=0.42..12.46 rows=2 width=4)" "              Index Cond: (name = '489147'::text)" "  ->  Index Scan using test1_pkey on test1 t  (cost=0.29..8.31 rows=1 width=36)" "        Index Cond: (id = tt.id)"

Also I have to note, there're some cost to insert/delete rows into table with names (test1_altnames), so it's a bit more complicated than just selecting rows. Personally I like solution with JSON.

answered Sep 28 '22 15:09

Roman Pekar

Related questions
                            
                                Getting String Value from Json Object Android
                            
                                Peewee model to JSON
                            
                                Convert string to JSON Object
                            
                                [__NSCFNumber length]: unrecognized selector sent to instance UITableView
                            
                                How to install json gem - Failed to build gem native extension(mac 10.10)
                            
                                Python to JSON Serialization fails on Decimal [duplicate]
                            
                                PHP json_encode json_decode UTF-8
                            
                                Jackson read json in generic List
                            
                                Jackson: Deserialize to a Map<String, Object> with correct type for each value
                            
                                How should I represent tabular data in JSON?
                            
                                Add json file comments [duplicate]
                            
                                Converting circular structure to JSON -- Any way to find what field it is complaining about?
                            
                                Can I use an empty string as an object identifier?
                            
                                Does LibreOffice Calc support JSON file importing/sorting?
                            
                                as_json not calling as_json on associations
                            
                                Type is an interface or abstract class and cannot be instantiated
                            
                                JSON date to Java date? [duplicate]
                            
                                Sending HTML Code Through JSON
                            
                                How do you OPENJSON on Arrays of Arrays
                            
                                Is it bad to add JSON on HTML data attribute?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Are postgres JSON indexes efficient enough compared with classic normalized tables?

Tags:

json

postgresql

postgresql-9.3

Peteris

People also ask

2 Answers

Denis de Bernardy

Roman Pekar

Recent Activity

Donate For Us