Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deep search within jsonb field PostgreSQL

A sample of my data looks something like this:

{"city": "NY", 
"skills": [
{"soft_skills": "Analysis"},
{"soft_skills": "Procrastination"},
{"soft_skills": "Presentation"}
],
"areas_of_training": [
{"areas of training": "Visio"},
{"areas of training": "Office"}, 
{"areas of training": "Risk Assesment"}
]}

I would like to run a query to find users with soft_skills Analysis and maybe run another one to find users whose area of training is Visio and Risk Assesment

My column type is jsonb. How can I implement a search query on these deeply nested objects? A query on level one for city works using SELECT * FROM mydata WHERE content::json->>'city'='NY';

How can I also run a match using the LIKE keyword or string matching for deeply nested values?

like image 882
Churchill Avatar asked Jul 09 '17 19:07

Churchill


People also ask

How do I search a Jsonb column in PostgreSQL?

Querying the JSON documentPostgreSQL has two native operators -> and ->> to query JSON documents. The first operator -> returns a JSON object, while the operator ->> returns text. These operators work on both JSON as well as JSONB columns. There are additional operators available for JSONB columns.

What is difference between JSON and Jsonb?

The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed.

What is Jsonb in Postgres?

JSONB stands for “JSON Binary” or “JSON better” depending on whom you ask. It is a decomposed binary format to store JSON. JSONB supports indexing the JSON data, and is very efficient at parsing and querying the JSON data. In most cases, when you work with JSON in PostgreSQL, you should be using JSONB.

Is Jsonb fast?

jsonb takes shortcuts for performance reasons: JSON data is parsed on input and stored in binary format, key orderings in dictionaries are not maintained, and neither are duplicate keys. Accessing individual elements in the JSONB field is fast as it doesn't require parsing the JSON text all the time.


2 Answers

1)

SELECT * FROM mydata
WHERE content->'skills' @> '[{"soft_skills": "Analysis"}]';

2)

SELECT * FROM mydata
WHERE content->'areas_of_training' @> '[{"areas of training": "Visio"},{"areas of training": "Risk Assesment"}]';

About JSON(B) operators

PS: And be ready for extremely slow queries. I highly recommend to think about data normalization.


Update for LIKE

For your example data it could be:

SELECT * FROM mydata
WHERE EXISTS (
  SELECT *
  FROM jsonb_array_elements(content->'areas_of_training') as a
  WHERE a->>'areas of training' ilike '%vi%');

But query highly depending on the actual JSON structure.

like image 131
Abelisto Avatar answered Oct 17 '22 08:10

Abelisto


Use json_array_elements() to get values of nested elements, examples:

select d.*
from mydata d,
json_array_elements(content->'skills')
where value->>'soft_skills' ilike '%analysis%';

select d.*
from mydata d,
json_array_elements(content->'areas_of_training')
where value->>'areas of training' ~* 'visio|office';

It is possible that the query yields duplicate rows, so it is reasonable to use select distinct on (id), where id is a primary key.

Note that the function json_array_elements() is costly and you cannot use indexes in contrary to Abelisto's solution. However you have to use it if you want to have an access to values of nested json elements.

like image 24
klin Avatar answered Oct 17 '22 08:10

klin