Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql JSONB is coming. What to use now? Hstore? JSON? EAV?

Tags:

After going through the relational DB/NoSQL research debate, I've come to the conclusion that I will be moving forward with PG as my data store. A big part of that decision was the announcement of JSONB coming to 9.4. My question is what should I do now, building an application from the ground up knowing that I want to migrate to (I mean use right now!) jsonb? The DaaS options for me are going to be running 9.3 for a while.

From what I can tell, and correct me if I'm wrong, hstore would run quite a bit faster since I'll be doing a lot of queries of many keys in the hstore column and if I were to use plain json I wouldn't be able to take advantage of indexing/GIN etc. However I could take advantage of nesting with json, but running any queries would be very slow and users would be frustrated.

So, do I build my app around the current version of hstore or json data type, "good ol" EAV or something else? Should I structure my DB and app code a certain way? Any advice would be greatly appreciated. I'm sure others may face the same question as we await the next official release of PostgreSQL.

A few extra details on the app I want to build:

-Very relational (with one exception below)
-Strong social network aspect (groups, friends, likes, timeline etc)
-Based around a single object with variable user assigned attributes, maybe 10 or 1000+ (this is where the schema-less design need comes into play)

Thanks in advance for any input!

like image 917
Mike Avatar asked Apr 04 '14 21:04

Mike


People also ask

Should I use JSON or Jsonb Postgres?

If duplicate keys are specified in the input, only the last value is kept. In general, most applications should prefer to store JSON data as jsonb , unless there are quite specialized needs, such as legacy assumptions about ordering of object keys. RFC 7159 specifies that JSON strings should be encoded in UTF8.

What is the difference between Jsonb and JSON?

Jsonb stores the data as binary code. Basically, it stores the data in binary form which is not an ASCII/ UTF-8 string. Json preserves the original formatting like the whitespaces as well as the ordering of keys. Jsonb does not preserve the original formatting of text like the whitespaces and the ordering of keys.

When should I use JSON in Postgres?

So, the question is: should you use JSON? At the end of the day, Postgres' JSON type simply provides JSON validation on a text field. If you're storing some form of log data you rarely need to query, JSON can work fine. Because it's so simple, it will have a lot higher write throughput.

What is Jsonb data type in PostgreSQL?

The jsonb datatype is an advanced binary storage format with full processing, indexing and searching capabilities, and as such pre-processes the JSON data to an internal format, which does include a single value per key; and also isn't sensible to extra whitespace or indentation.


2 Answers

It depends. If you expect to have a lot of users, a very high transaction volume, or an insane number of attribute fetches per query, I would say use HSTORE. If, however, you app will start small and grow over time, or have relatively few transactions that fetch attributes, or just fetch a few per query, then use JSON. Even in the latter case, if you're not fetching many attributes but checking one or two keys often in the WHERE clause of your queries, you can create a functional index to speed things up:

CREATE INDEX idx_foo_somekey ON foo((bar ->> 'somekey')); 

Now, when you have WHERE bar ->> somekey, it should use the index.

And of course, it will be easier to use nested data and to upgrade to jsonb when it becomes available to you.

So I would lean toward JSON unless you know for sure you're going kick your server's ass with heavy use of key fetches before you have a chance to upgrade to 9.4. But to be sure of that, I would say, do some benchmarking with anticipated query volumes now and see what works best for you.

like image 96
theory Avatar answered Sep 29 '22 05:09

theory


You probably don't give quite enough to give a very detailed answer, but I will say this... If your data is "very relational" then I believe your best course is to build it with a good relational design. If it's just one field with "variable assigned attributes", then that sounds like a good use for an hstore. Which is pretty tried and true at this point. I've been doing some reading on 9.4 and jsonb sounds cool, but, that won't be out for a while. I suspect that a good schema design in 9.3 + a very targeted use of hstore will probably yield a good combination of performance and flexibility.

like image 44
David S Avatar answered Sep 29 '22 07:09

David S