I just discovered JSONB for PostgreSQL and was wondering what could go wrong if I used it for all my tables' columns ?
That is to say all my tables would have primary and foreign keys as columns and a field
column of type JSONB for any other data.
Besides taking up extra space because of JSONB's overhead, and losing typing on "columns", what would I miss ?
It turns out you're on to something here.
The major points of using a relational database.
You get to keep the relationships. But you lose the schema and a lot of the performance. The schema is more than just data validation. It means you can't use triggers or constraints on individual fields.
As for performance... you'll notice that most tests of JSONB performance are against other similar data types. They're never against normal SQL tables. That's because, while JSONB is astonishingly efficient, its not nearly as efficient as regular SQL. So let's test it, it turns out you're on to something here.
Using the dataset from this JSONB performance presentation I created a proper SQL schema...
create table customers (
id text primary key
);
create table products (
id text primary key,
title text,
sales_rank integer,
"group" text,
category text,
subcategory text,
similar_ids text[]
);
create table reviews (
customer_id text references customers(id),
product_id text references products(id),
"date" timestamp,
rating integer,
votes integer,
helpful_votes integer
);
And one that uses SQL relationships but JSONB for data...
create table customers (
id text primary key
);
create table products_jb (
id text primary key,
fields jsonb
);
create table reviews_jb (
customer_id text references customers(id),
product_id text references products_jb(id),
fields jsonb
);
And a single JSONB table.
create table reviews_jsonb (
review jsonb
);
Then I imported the same data into both sets of tables using a little script. 589859 reviews, 93319 products, 98761 customers.
Let's try the same query as in the JSONB performance article, getting the average review for a product category. First, without indexes.
Traditional SQL: 138 ms
test=> select round(avg(r.rating), 2)
from reviews r
join products p on p.id = r.product_id
where p.category = 'Home & Garden';
round
-------
4.59
(1 row)
Time: 138.631 ms
Full JSONB: 380 ms
test=> select round(avg((review#>>'{review,rating}')::numeric),2)
test-> from reviews_jsonb
test-> where review #>>'{product,category}' = 'Home & Garden';
round
-------
4.59
(1 row)
Time: 380.697 ms
Hybrid JSONB: 190 ms
test=> select round(avg((r.fields#>>'{rating}')::numeric),2)
from reviews_jb r
join products_jb p on p.id = r.product_id
where p.fields#>>'{category}' = 'Home & Garden';
round
-------
4.59
(1 row)
Time: 192.333 ms
That honestly went better than it thought. The hybrid approach is twice as fast as full JSONB, but 50% slower than normal SQL. Now how about with indexes?
Traditional SQL: 130 ms (+500 ms for the index)
test=> create index products_category on products(category);
CREATE INDEX
Time: 491.969 ms
test=> select round(avg(r.rating), 2)
from reviews r
join products p on p.id = r.product_id
where p.category = 'Home & Garden';
round
-------
4.59
(1 row)
Time: 128.212 ms
Full JSONB: 360 ms (+ 25000 ms for the index)
test=> create index on reviews_jsonb using gin(review);
CREATE INDEX
Time: 25253.348 ms
test=> select round(avg((review#>>'{review,rating}')::numeric),2)
from reviews_jsonb
where review #>>'{product,category}' = 'Home & Garden';
round
-------
4.59
(1 row)
Time: 363.222 ms
Hybrid JSONB: 185 ms (+6900 ms for the indexes)
test=> create index on products_jb using gin(fields);
CREATE INDEX
Time: 3654.894 ms
test=> create index on reviews_jb using gin(fields);
CREATE INDEX
Time: 3237.534 ms
test=> select round(avg((r.fields#>>'{rating}')::numeric),2)
from reviews_jb r
join products_jb p on p.id = r.product_id
where p.fields#>>'{category}' = 'Home & Garden';
round
-------
4.59
(1 row)
Time: 183.679 ms
It turns out this is a query indexing isn't going to be much help for.
That's what I see playing with the data a bit, Hybrid JSONB is always slower than Full SQL, but faster than Full JSONB. It seems like a good compromise. You get to use traditional foreign keys and joins, but have the flexibility of adding whatever fields you like.
I recommend taking the hybrid approach a step further: use SQL columns for the fields you know are going to be there, and have a JSONB column to pick up any additional fields for flexibility.
I encourage you to play around with the test data here and see what the performance is like.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With