Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use JSONB for postgres columns other than primary and foreign keys

I just discovered JSONB for PostgreSQL and was wondering what could go wrong if I used it for all my tables' columns ?

That is to say all my tables would have primary and foreign keys as columns and a field column of type JSONB for any other data.

Besides taking up extra space because of JSONB's overhead, and losing typing on "columns", what would I miss ?

like image 964
dynamic_cast Avatar asked Feb 15 '17 18:02

dynamic_cast


1 Answers

It turns out you're on to something here.

The major points of using a relational database.

  • Well defined relationships.
  • A well defined and detailed schema.
  • High performance for large data sets.

You get to keep the relationships. But you lose the schema and a lot of the performance. The schema is more than just data validation. It means you can't use triggers or constraints on individual fields.

As for performance... you'll notice that most tests of JSONB performance are against other similar data types. They're never against normal SQL tables. That's because, while JSONB is astonishingly efficient, its not nearly as efficient as regular SQL. So let's test it, it turns out you're on to something here.

Using the dataset from this JSONB performance presentation I created a proper SQL schema...

create table customers (
    id text primary key
);

create table products (
    id text primary key,
    title text,
    sales_rank integer,
    "group" text,
    category text,
    subcategory text,
    similar_ids text[]
);

create table reviews (
    customer_id text references customers(id),
    product_id text references products(id),
    "date" timestamp,
    rating integer,
    votes integer,
    helpful_votes integer
);

And one that uses SQL relationships but JSONB for data...

create table customers (
    id text primary key
);

create table products_jb (
    id text primary key,
    fields jsonb
);

create table reviews_jb (
    customer_id text references customers(id),
    product_id text references products_jb(id),
    fields jsonb
);

And a single JSONB table.

create table reviews_jsonb (
    review jsonb
);

Then I imported the same data into both sets of tables using a little script. 589859 reviews, 93319 products, 98761 customers.

Let's try the same query as in the JSONB performance article, getting the average review for a product category. First, without indexes.

Traditional SQL: 138 ms

test=> select round(avg(r.rating), 2)
from reviews r
join products p on p.id = r.product_id
where p.category = 'Home & Garden';
 round 
-------
  4.59
(1 row)

Time: 138.631 ms

Full JSONB: 380 ms

test=> select round(avg((review#>>'{review,rating}')::numeric),2)
test-> from reviews_jsonb
test-> where review #>>'{product,category}' = 'Home & Garden';
 round 
-------
  4.59
(1 row)

Time: 380.697 ms

Hybrid JSONB: 190 ms

test=> select round(avg((r.fields#>>'{rating}')::numeric),2)
from reviews_jb r
join products_jb p on p.id = r.product_id
where p.fields#>>'{category}' = 'Home & Garden';
 round 
-------
  4.59
(1 row)

Time: 192.333 ms

That honestly went better than it thought. The hybrid approach is twice as fast as full JSONB, but 50% slower than normal SQL. Now how about with indexes?

Traditional SQL: 130 ms (+500 ms for the index)

test=> create index products_category on products(category);
CREATE INDEX
Time: 491.969 ms

test=> select round(avg(r.rating), 2)
from reviews r
join products p on p.id = r.product_id
where p.category = 'Home & Garden';
 round 
-------
  4.59
(1 row)

Time: 128.212 ms

Full JSONB: 360 ms (+ 25000 ms for the index)

test=> create index on reviews_jsonb using gin(review);
CREATE INDEX
Time: 25253.348 ms
test=> select round(avg((review#>>'{review,rating}')::numeric),2)
from reviews_jsonb
where review #>>'{product,category}' = 'Home & Garden';
 round 
-------
  4.59
(1 row)

Time: 363.222 ms

Hybrid JSONB: 185 ms (+6900 ms for the indexes)

test=> create index on products_jb using gin(fields);
CREATE INDEX
Time: 3654.894 ms
test=> create index on reviews_jb using gin(fields);
CREATE INDEX
Time: 3237.534 ms
test=> select round(avg((r.fields#>>'{rating}')::numeric),2)
from reviews_jb r
join products_jb p on p.id = r.product_id
where p.fields#>>'{category}' = 'Home & Garden';
 round 
-------
  4.59
(1 row)

Time: 183.679 ms

It turns out this is a query indexing isn't going to be much help for.

That's what I see playing with the data a bit, Hybrid JSONB is always slower than Full SQL, but faster than Full JSONB. It seems like a good compromise. You get to use traditional foreign keys and joins, but have the flexibility of adding whatever fields you like.

I recommend taking the hybrid approach a step further: use SQL columns for the fields you know are going to be there, and have a JSONB column to pick up any additional fields for flexibility.

I encourage you to play around with the test data here and see what the performance is like.

like image 125
Schwern Avatar answered Oct 21 '22 13:10

Schwern