Postgres - Performance of large jsonb column

Tags:

We are using Postgres jsonb type in one of our DB tables. Table structure is shown as below:

CREATE TABLE T (
  id UUID NOT NULL PRIMARY KEY,
  payload JSONB
);

CREATE INDEX ON T USING gin (payload jsonb_path_ops);

Payload is a complex json string. Below is one example:

{
    "business": {
        "taxId": "626642071",
        "legalName": "Jon's Deli",
        "phoneNumbers": [
            {
                "phoneType": "Business",
                "telephoneNumber": "8384407555"
            },
            {
                "phoneType": "Work",
                "telephoneNumber": "6032255248"
            }
        ],
        "addresses": [
            {
                "city": "San Francisco",
                "state": "CA",
                "postalCode": "94101",
                "countryCode": "USA",
                "addressLine1": "123 Market St"
            }
        ]
    },
    "stakeholders": [
        {
            "person": {
                "taxId": "540646815",
                "firstName": "GdXFouh",
                "lastName": "IlUAcgCGz",
                "dateOfBirth": "1980-12-11",
                "emailAddress": "[email protected]",
                "phoneNumbers": [
                    {
                        "phoneType": "Mobile",
                        "telephoneNumber": "4901371573"
                    }
                ],
                "addresses": [
                    {
                        "city": "San Francisco",
                        "state": "CA",
                        "postalCode": "94101",
                        "countryCode": "USA",
                        "addressLine1": "123 Market St"
                    }
                ]
            }
        }
    ]
}

Note that phoneNumbers, addresses and stakeholders are arrays, which means there can be multiple elements in the array.

I try to insert one million rows into the table. Each field of payload is generated randomly. Initially the testing program runs very fast. But after inserting about 800,000 rows, it gets stuck every 1000 rows -- insert 1000 rows, then the testing program is hung for 2 minutes, then it comes back and insert another 1000 rows, ...

We are suspecting this is caused by huge amount of jsonb index updates. Because there are many fields to be updated in the index for a single row. We just want to confirm if anyone has met the same problem.

Actually we don't need to index the whole payload column. Only certain fields are needed: business->taxId, business->phoneNumbers-> telephoneNumber, stakeholders->person->taxId and stakeholders->person->emailAddress.

I have tried following two indices:

CREATE INDEX ON T USING gin ((payload->'business'->'taxId') jsonb_path_ops);
CREATE INDEX ON T USING gin ((payload ->'stakeholders'->'person'->'taxId') jsonb_path_ops);

And run two statements:

explain select * from T where payload->'business'->'taxId' @> '"123456789"'; (1)
explain select * from T where payload->'stakeholders'->'person'->'taxId' @> '"123456789"'; (2)

The first statement is using the index. But the second one is doing a full table scan which is very slow. That's why we turn to index the whole payload column.

Any suggestion is welcome.

BTW, we are using Postgres 9.5.4.

378

asked Feb 02 '17 04:02

Jiaguo Fang

1 Answers

Your query:

select * from T where payload->'stakeholders'->'person'->'taxId' @> '"123456789"';

does not work. This is because 'stakeholders' is array. Worked query is:

select * from T where payload->'stakeholders' @> '[{"person": {"taxId": "54"}}]'::jsonb

But in this case postgres can use use index on whole stakeholders.

                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=1388.08..1425.90 rows=10 width=36) (actual time=1.959..1.959 rows=1 loops=1)
   Recheck Cond: ((payload -> 'stakeholders'::text) @> '[{"person": {"taxId": "54"}}]'::jsonb)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on t_expr_idx3  (cost=0.00..1388.08 rows=10 width=0) (actual time=1.946..1.946 rows=1 loops=1)
         Index Cond: ((payload -> 'stakeholders'::text) @> '[{"person": {"taxId": "54"}}]'::jsonb)
 Planning time: 0.071 ms
 Execution time: 1.978 ms

For using more specific index I use modified approach by: How do you create a Postgresql JSONB array in array index?

CREATE OR REPLACE FUNCTION extract_taxids(a_json jsonb).
RETURNS jsonb AS $BODY$
     SELECT jsonb_agg(j) FROM (SELECT jsonb_array_elements(a_json->'stakeholders')->'person'->'taxId' AS j) AS j
$BODY$ LANGUAGE sql IMMUTABLE;
CREATE INDEX ON T USING gin (extract_taxids(payload));

And voila:

EXPLAIN ANALYZE select * from T where extract_taxids(payload) @> '["54"]';

                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=12.08..52.38 rows=10 width=36) (actual time=0.101..0.102 rows=1 loops=1)
   Recheck Cond: (extract_taxids(payload) @> '["54"]'::jsonb)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on t_extract_taxids_idx  (cost=0.00..12.07 rows=10 width=0) (actual time=0.008..0.008 rows=1 loops=1)
         Index Cond: (extract_taxids(payload) @> '["54"]'::jsonb)
 Planning time: 0.128 ms
 Execution time: 0.117 ms

answered Nov 16 '22 00:11

Roman Tkachuk

Related questions
                            
                                How to query date range on the MongoDB collection where the ISO date is stored in string field?
                            
                                HttpMethod.Delete not working with RestTemplate of Spring-Android
                            
                                Create JSON array of strings with jsoncpp
                            
                                Is there a way to specify a simpler JSON (de-)serialization for std::map using Cereal / C++?
                            
                                Along with title and snippet how to add extra data to marker?
                            
                                How to make Jackson throw exception as is when deserialization mapping fail
                            
                                Find and replace for JSON with sed or awk
                            
                                How to map JSON objects with dynamic fields to Go structs
                            
                                Use Jackson to write yaml?
                            
                                Validate JSON schema compliance with Jackson against an external schema file
                            
                                Explicitly output JSON null in case of missing optional value
                            
                                NSJSONSerialization with small decimal numbers
                            
                                Strongloop app does not load local datasource
                            
                                Can I have a JavaScript object key without a value
                            
                                Relations and differences between marshall/unmarshal, encoding/decoding, and serialization/deserialization for JSON?
                            
                                How to handle post data of size more than 2 mb
                            
                                How can I add a check constraint for JSON in Entity Framework?
                            
                                Web API POST parameter is null for large JSON request
                            
                                How to join json list with a table in SQL Server using OPENJSON?
                            
                                Is a primitive type considered JSON?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Postgres - Performance of large jsonb column

Tags:

performance

json

indexing

postgresql

jsonb

Jiaguo Fang

People also ask

1 Answers

Roman Tkachuk

Recent Activity

Donate For Us