Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres - Performance of large jsonb column

We are using Postgres jsonb type in one of our DB tables. Table structure is shown as below:

CREATE TABLE T (
  id UUID NOT NULL PRIMARY KEY,
  payload JSONB
);

CREATE INDEX ON T USING gin (payload jsonb_path_ops);

Payload is a complex json string. Below is one example:

{
    "business": {
        "taxId": "626642071",
        "legalName": "Jon's Deli",
        "phoneNumbers": [
            {
                "phoneType": "Business",
                "telephoneNumber": "8384407555"
            },
            {
                "phoneType": "Work",
                "telephoneNumber": "6032255248"
            }
        ],
        "addresses": [
            {
                "city": "San Francisco",
                "state": "CA",
                "postalCode": "94101",
                "countryCode": "USA",
                "addressLine1": "123 Market St"
            }
        ]
    },
    "stakeholders": [
        {
            "person": {
                "taxId": "540646815",
                "firstName": "GdXFouh",
                "lastName": "IlUAcgCGz",
                "dateOfBirth": "1980-12-11",
                "emailAddress": "[email protected]",
                "phoneNumbers": [
                    {
                        "phoneType": "Mobile",
                        "telephoneNumber": "4901371573"
                    }
                ],
                "addresses": [
                    {
                        "city": "San Francisco",
                        "state": "CA",
                        "postalCode": "94101",
                        "countryCode": "USA",
                        "addressLine1": "123 Market St"
                    }
                ]
            }
        }
    ]
}

Note that phoneNumbers, addresses and stakeholders are arrays, which means there can be multiple elements in the array.

I try to insert one million rows into the table. Each field of payload is generated randomly. Initially the testing program runs very fast. But after inserting about 800,000 rows, it gets stuck every 1000 rows -- insert 1000 rows, then the testing program is hung for 2 minutes, then it comes back and insert another 1000 rows, ...

We are suspecting this is caused by huge amount of jsonb index updates. Because there are many fields to be updated in the index for a single row. We just want to confirm if anyone has met the same problem.


Actually we don't need to index the whole payload column. Only certain fields are needed: business->taxId, business->phoneNumbers-> telephoneNumber, stakeholders->person->taxId and stakeholders->person->emailAddress.

I have tried following two indices:

CREATE INDEX ON T USING gin ((payload->'business'->'taxId') jsonb_path_ops);
CREATE INDEX ON T USING gin ((payload ->'stakeholders'->'person'->'taxId') jsonb_path_ops);

And run two statements:

explain select * from T where payload->'business'->'taxId' @> '"123456789"'; (1)
explain select * from T where payload->'stakeholders'->'person'->'taxId' @> '"123456789"'; (2)

The first statement is using the index. But the second one is doing a full table scan which is very slow. That's why we turn to index the whole payload column.


Any suggestion is welcome.

BTW, we are using Postgres 9.5.4.

like image 378
Jiaguo Fang Avatar asked Feb 02 '17 04:02

Jiaguo Fang


People also ask

Is Jsonb fast?

In the case of jsonb, it is comparatively faster to process the data as no reparsing is needed. JSON does not support indexing, unlike jsonb. Jsonb supports indexing to search for the keys or key/ value pairs which is a great advantage at a bigger database jsonb documents.

Is Jsonb efficient?

Most applications should use JSONB for schemaless data. It stores parsed JSON in a binary format, so queries are efficient.

What is Jsonb column?

The JSONB data type stores JSON (JavaScript Object Notation) data as a binary representation of the JSONB value, which eliminates whitespace, duplicate keys, and key ordering. JSONB supports GIN indexes.

What is Jsonb in Postgres?

JSONB stands for “JSON Binary” or “JSON better” depending on whom you ask. It is a decomposed binary format to store JSON. JSONB supports indexing the JSON data, and is very efficient at parsing and querying the JSON data. In most cases, when you work with JSON in PostgreSQL, you should be using JSONB.


1 Answers

Your query:

select * from T where payload->'stakeholders'->'person'->'taxId' @> '"123456789"';

does not work. This is because 'stakeholders' is array. Worked query is:

select * from T where payload->'stakeholders' @> '[{"person": {"taxId": "54"}}]'::jsonb

But in this case postgres can use use index on whole stakeholders.

                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=1388.08..1425.90 rows=10 width=36) (actual time=1.959..1.959 rows=1 loops=1)
   Recheck Cond: ((payload -> 'stakeholders'::text) @> '[{"person": {"taxId": "54"}}]'::jsonb)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on t_expr_idx3  (cost=0.00..1388.08 rows=10 width=0) (actual time=1.946..1.946 rows=1 loops=1)
         Index Cond: ((payload -> 'stakeholders'::text) @> '[{"person": {"taxId": "54"}}]'::jsonb)
 Planning time: 0.071 ms
 Execution time: 1.978 ms

For using more specific index I use modified approach by: How do you create a Postgresql JSONB array in array index?

CREATE OR REPLACE FUNCTION extract_taxids(a_json jsonb).
RETURNS jsonb AS $BODY$
     SELECT jsonb_agg(j) FROM (SELECT jsonb_array_elements(a_json->'stakeholders')->'person'->'taxId' AS j) AS j
$BODY$ LANGUAGE sql IMMUTABLE;
CREATE INDEX ON T USING gin (extract_taxids(payload));

And voila:

EXPLAIN ANALYZE select * from T where extract_taxids(payload) @> '["54"]';

                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=12.08..52.38 rows=10 width=36) (actual time=0.101..0.102 rows=1 loops=1)
   Recheck Cond: (extract_taxids(payload) @> '["54"]'::jsonb)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on t_extract_taxids_idx  (cost=0.00..12.07 rows=10 width=0) (actual time=0.008..0.008 rows=1 loops=1)
         Index Cond: (extract_taxids(payload) @> '["54"]'::jsonb)
 Planning time: 0.128 ms
 Execution time: 0.117 ms
like image 61
Roman Tkachuk Avatar answered Nov 16 '22 00:11

Roman Tkachuk