We are using Postgres jsonb type in one of our DB tables. Table structure is shown as below:
CREATE TABLE T (
id UUID NOT NULL PRIMARY KEY,
payload JSONB
);
CREATE INDEX ON T USING gin (payload jsonb_path_ops);
Payload
is a complex json string. Below is one example:
{
"business": {
"taxId": "626642071",
"legalName": "Jon's Deli",
"phoneNumbers": [
{
"phoneType": "Business",
"telephoneNumber": "8384407555"
},
{
"phoneType": "Work",
"telephoneNumber": "6032255248"
}
],
"addresses": [
{
"city": "San Francisco",
"state": "CA",
"postalCode": "94101",
"countryCode": "USA",
"addressLine1": "123 Market St"
}
]
},
"stakeholders": [
{
"person": {
"taxId": "540646815",
"firstName": "GdXFouh",
"lastName": "IlUAcgCGz",
"dateOfBirth": "1980-12-11",
"emailAddress": "[email protected]",
"phoneNumbers": [
{
"phoneType": "Mobile",
"telephoneNumber": "4901371573"
}
],
"addresses": [
{
"city": "San Francisco",
"state": "CA",
"postalCode": "94101",
"countryCode": "USA",
"addressLine1": "123 Market St"
}
]
}
}
]
}
Note that phoneNumbers
, addresses
and stakeholders
are arrays, which means there can be multiple elements in the array.
I try to insert one million rows into the table. Each field of payload
is generated randomly. Initially the testing program runs very fast. But after inserting about 800,000 rows, it gets stuck every 1000 rows -- insert 1000 rows, then the testing program is hung for 2 minutes, then it comes back and insert another 1000 rows, ...
We are suspecting this is caused by huge amount of jsonb index updates. Because there are many fields to be updated in the index for a single row. We just want to confirm if anyone has met the same problem.
Actually we don't need to index the whole payload
column. Only certain fields are needed: business->taxId
, business->phoneNumbers-> telephoneNumber
, stakeholders->person->taxId
and stakeholders->person->emailAddress
.
I have tried following two indices:
CREATE INDEX ON T USING gin ((payload->'business'->'taxId') jsonb_path_ops);
CREATE INDEX ON T USING gin ((payload ->'stakeholders'->'person'->'taxId') jsonb_path_ops);
And run two statements:
explain select * from T where payload->'business'->'taxId' @> '"123456789"'; (1)
explain select * from T where payload->'stakeholders'->'person'->'taxId' @> '"123456789"'; (2)
The first statement is using the index. But the second one is doing a full table scan which is very slow. That's why we turn to index the whole payload
column.
Any suggestion is welcome.
BTW, we are using Postgres 9.5.4.
In the case of jsonb, it is comparatively faster to process the data as no reparsing is needed. JSON does not support indexing, unlike jsonb. Jsonb supports indexing to search for the keys or key/ value pairs which is a great advantage at a bigger database jsonb documents.
Most applications should use JSONB for schemaless data. It stores parsed JSON in a binary format, so queries are efficient.
The JSONB data type stores JSON (JavaScript Object Notation) data as a binary representation of the JSONB value, which eliminates whitespace, duplicate keys, and key ordering. JSONB supports GIN indexes.
JSONB stands for “JSON Binary” or “JSON better” depending on whom you ask. It is a decomposed binary format to store JSON. JSONB supports indexing the JSON data, and is very efficient at parsing and querying the JSON data. In most cases, when you work with JSON in PostgreSQL, you should be using JSONB.
Your query:
select * from T where payload->'stakeholders'->'person'->'taxId' @> '"123456789"';
does not work. This is because 'stakeholders' is array. Worked query is:
select * from T where payload->'stakeholders' @> '[{"person": {"taxId": "54"}}]'::jsonb
But in this case postgres can use use index on whole stakeholders.
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=1388.08..1425.90 rows=10 width=36) (actual time=1.959..1.959 rows=1 loops=1)
Recheck Cond: ((payload -> 'stakeholders'::text) @> '[{"person": {"taxId": "54"}}]'::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on t_expr_idx3 (cost=0.00..1388.08 rows=10 width=0) (actual time=1.946..1.946 rows=1 loops=1)
Index Cond: ((payload -> 'stakeholders'::text) @> '[{"person": {"taxId": "54"}}]'::jsonb)
Planning time: 0.071 ms
Execution time: 1.978 ms
For using more specific index I use modified approach by: How do you create a Postgresql JSONB array in array index?
CREATE OR REPLACE FUNCTION extract_taxids(a_json jsonb).
RETURNS jsonb AS $BODY$
SELECT jsonb_agg(j) FROM (SELECT jsonb_array_elements(a_json->'stakeholders')->'person'->'taxId' AS j) AS j
$BODY$ LANGUAGE sql IMMUTABLE;
CREATE INDEX ON T USING gin (extract_taxids(payload));
And voila:
EXPLAIN ANALYZE select * from T where extract_taxids(payload) @> '["54"]';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=12.08..52.38 rows=10 width=36) (actual time=0.101..0.102 rows=1 loops=1)
Recheck Cond: (extract_taxids(payload) @> '["54"]'::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on t_extract_taxids_idx (cost=0.00..12.07 rows=10 width=0) (actual time=0.008..0.008 rows=1 loops=1)
Index Cond: (extract_taxids(payload) @> '["54"]'::jsonb)
Planning time: 0.128 ms
Execution time: 0.117 ms
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With