Current Postgresql versions have introduced various features for JSON content, but I'm concerned if I really should use them - I mean, there is not yet "best practice" estabilished on what works and what doesn't, or at least I can't find it.
I have a specific example - I have a table about objects which, among other things, contains a list of alternate names for that object. All that data will also be included in a JSON column for retrieval purposes. For example (skipping all the other irrelevant fields).
create table stuff (id serial primary key, data json); insert into stuff(data) values('{"AltNames":["Name1","Name2","Name3"]}')
I will need some queries in the form "list all objects where one of altnames is 'foobar'." The expected table size is on the order of a few million records. Postgres JSON queries can be used for that, and it can also be indexed (Index for finding an element in a JSON array, for example). However, SHOULD it be done that way or is it a perverse workaround that's not recommended?
The classic alternative, of course, is to add an additional table for that one-to-many relation, containing the name and a foreign key to the main table; the performance of that is well understood. However, that has it's own disadvantages as then it means either data duplication between that table and JSON (with possible integrity risk); or creating that JSON return data dynamically at every request, which has it's own performance penalty.
It stores parsed JSON in a binary format, so queries are efficient. Accessing JSONB values is about 2× slower than accessing a BYTEA column. Queries on HSTORE values are slightly faster (~10-20%), so if performance is critical and string key/value pairs are sufficient, it is worth considering.
Follow these guidelines when you consider using JSON in PostgreSQL: Don't use JSON for data that can easily be stored in database tables. Avoid large JSON objects if you want to modify individual attributes. Don't use JSON if you want to use attributes in complicated WHERE conditions.
Json processes input faster than jsonb as there is no conversion involved in this. Jsonb converts the JSON data into the binary form so it has slightly slower input due to the binary conversion overhead. There is no change in the Schema design while working with JSON.
Speed. Because JSONB stores data in a binary format, queries process significantly faster. Storing data in binary form allows Postgres to access a particular JSON key-value pair without reading the entire JSON record.
I will need some queries in the form "list all objects where one of altnames is 'foobar'." The expected table size is on the order of a few million records. Postgres JSON queries can be used for that, and it can also be indexed (Index For Finding Element in JSON array, for example). However, SHOULD it be done that way or is it a perverse workaround that's not recommended?
It can be done that way but that doesn't mean that you should. In some sense, the best practice is well documented already (see e.g. using hstore vs using XML vs using EAV vs using a separate table) with a new datatype which, for all intents and practical purposes (besides validation and syntax), is no different from prior unstructured or semi-structured options.
Put another way, it's the same old pig with new makeup.
JSON offers the ability to use inverted search tree indexes, in the same way as hstore, array types and tsvectors do. They work fine, but keep in mind that they're primarily designed for extracting points in a neighborhood (think geometry types) ordered by distance, rather than for extracting a list of values in lexicographical order.
To illustrate, take the two plans that Roman's answer outlines:
Getting back to your question: Cluttered and oversized inverted tree indexes will indeed improve the performance of your app if you use Postgres tables as giant JSON stores. But they're not a silver bullet either, and they won't get you as far as proper relational design when dealing with bottlenecks.
The bottom line, in the end, is no different from what you'd get when deciding to use hstore or an EAV:
I'd say it worth a try. I've created some test (100000 records, ~10 elements in JSON array) and checked how it's working:
create table test1 (id serial primary key, data json); create table test1_altnames (id int, name text); create or replace function array_from_json(_j json) returns text[] as $func$ select array_agg(x.elem::text) from json_array_elements(_j) as x(elem) $func$ language sql immutable; with cte as ( select (random() * 100000)::int as grp, (random() * 1000000)::int as name from generate_series(1, 1000000) ), cte2 as ( select array_agg(Name) as "AltNames" from cte group by grp ) insert into test1 (data) select row_to_json(t) from cte2 as t insert into test1_altnames (id, name) select id, json_array_elements(data->'AltNames')::text from test1 create index ix_test1 on test1 using gin(array_from_json(data->'AltNames')); create index ix_test1_altnames on test1_altnames (name);
Query JSON (30ms on my machine):
select * from test1 where '{489147}' <@ array_from_json(data->'AltNames'); "Bitmap Heap Scan on test1 (cost=224.13..1551.41 rows=500 width=36)" " Recheck Cond: ('{489147}'::text[] <@ array_from_json((data -> 'AltNames'::text)))" " -> Bitmap Index Scan on ix_test1 (cost=0.00..224.00 rows=500 width=0)" " Index Cond: ('{489147}'::text[] <@ array_from_json((data -> 'AltNames'::text)))"
Query table with names (15ms on my machine):
select * from test1 as t where t.id in (select tt.id from test1_altnames as tt where tt.name = '489147'); "Nested Loop (cost=12.76..20.80 rows=2 width=36)" " -> HashAggregate (cost=12.46..12.47 rows=1 width=4)" " -> Index Scan using ix_test1_altnames on test1_altnames tt (cost=0.42..12.46 rows=2 width=4)" " Index Cond: (name = '489147'::text)" " -> Index Scan using test1_pkey on test1 t (cost=0.29..8.31 rows=1 width=36)" " Index Cond: (id = tt.id)"
Also I have to note, there're some cost to insert/delete rows into table with names (test1_altnames
), so it's a bit more complicated than just selecting rows. Personally I like solution with JSON.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With