Indexing jsonb data for pattern matching searches

Tags:

This is a follow-up to:
Pattern matching on jsonb key/value

I have a table as follows

CREATE TABLE "PreStage".transaction (
  transaction_id serial NOT NULL,
  transaction jsonb
  CONSTRAINT pk_transaction PRIMARY KEY (transaction_id)
);

The content in my transaction jsonb column looks like

{"ADDR": "abcd", "CITY": "abcd", "PROV": "",
 "ADDR2": "",
 "ADDR3": "","CNSNT": "Research-NA", "CNTRY": "NL", "EMAIL": "@.com",
             "PHONE": "12345", "HCO_NM": "HELLO", "UNQ_ID": "", 
             "PSTL_CD": "1234", "HCP_SR_NM": "", "HCP_FST_NM": "",
             "HCP_MID_NM": ""}

I need search query like:

SELECT transaction AS data FROM   "PreStage".transaction
WHERE  transaction->>'HCP_FST_NM' ILIKE '%neer%';

But I need to give my user flexibility to search any key/value on the fly.

An answer to the previous question suggested to create index as:

CREATE INDEX idxgin ON "PreStage".transaction
USING gin ((transaction->>'HCP_FST_NM') gin_trgm_ops);

Which works, but I wanted to index other keys, too. Hence was trying something like:

CREATE INDEX idxgin ON "PreStage".transaction USING gin
((transaction->>'HCP_FST_NM'),(transaction->>'HCP_LST_NM') gin_trgm_ops)

Which doesn't work. What would be the best indexing approach here or will I have to create a separate index for each key in which case the approach will not be generic if a new key/value pair is added to the data.

851

asked Oct 08 '15 21:10

Neeraj Gupta

2 Answers

The syntax error that @jjanes pointed out aside,
for a mix of some popular keys (contained in many rows and / or searched often) plus many more rare keys (contained in few rows and / or rarely searched, new keys might pop up dynamically) I suggest this combination:

Trigram indexes for popular keys

It does not seem like you are going to combine multiple keys in one search often, and a single index with many keys would grow very big and slow. So I would create a separate index for each popular key. Make it a partial index for keys that are not contained in most rows:

CREATE INDEX trans_idxgin_HCP_FST_NM ON transaction  -- contained in most rows
USING gin ((transaction->>'HCP_FST_NM') gin_trgm_ops);

CREATE INDEX trans_idxgin_ADDR ON transaction  -- not in most rows
USING gin ((transaction->>'ADDR') gin_trgm_ops)
WHERE transaction ? 'ADDR';

Etc. Like detailed in my previous answer:

Pattern matching on jsonb key/value

Basic jsonb GIN index

If you have many different keys and / or new keys are added dynamically, you can cover the rest with a basic (default) jsonb_ops GIN index:

CREATE INDEX trans_idxgin ON "PreStage".transaction USING gin (transaction);

Among other things, this supports the search for keys. But you cannot use it for pattern matching on values.

What's the proper index for querying structures in arrays in Postgres jsonb?

Query

Combine predicates addressing both indexes:

SELECT transaction AS data
FROM   "PreStage".transaction
WHERE  transaction->>'HCP_FST_NM' ILIKE '%neer%'
AND    transaction ? 'HCP_FST_NM';  -- even if that seems redundant.

The second condition happens to match our partial indexes as well.

So either there is a specific trigram index for the given (popular / common) key, or there is at least an index to find (the few) rows containing the rare key - and then filter for matching values. The same query should give you the best of both worlds.

Be sure to run the latest version of Postgres, there have been various updates for cost estimates recently. It will be crucial that Postgres works with good estimates and current table statistics to choose the best query plan.

153

answered Oct 18 '22 21:10

Erwin Brandstetter

There is no built in index that does precisely what you want, searching for an exact key and a corresponding wild-card matching value, without specifying ahead of time which key(s) to use. It should be possible to create an extension which would do this, but it would be an awful lot of work, and I don't know of any that exist.

Your best option that works out of the box might be to cast the jsonb to text and index that text:

create index on transaction using gin ((transaction::text) gin_trgm_ops);

And then add a secondary condition to your query:

SELECT transaction AS data FROM transaction
WHERE  transaction->>'HCP_FST_NM' ILIKE '%neer%'
AND transaction::text ilike '%neer%';

Now it can use the index to find anything containing 'neer', and then later re-check that 'neer' occurs in the value for the 'HCP_FST_NM' key, as opposed to just some other place in the JSONB.

If your query word occurs in lots of places other than in the value of the desired key, then this might not give you very good performance. For example, if someone searched for:

transaction->>'EMAIL' ilike '%ADDR%'
AND transaction::text ilike '%ADDR%';

The the index would return every row, assuming all records have the same structure as what you show, because every row contains 'ADDR' because used as a key. Then every row would fail the other condition check, but only after doing a lot of work.

answered Oct 18 '22 22:10

jjanes

Related questions
                            
                                How to replicate INSERT/UPDATE/DELETE statements using JPA and Hibernate
                            
                                How should I handle "ranked x out of y" data in PostgreSQL?
                            
                                Put indexes on a table with lots of BOOL/TINYINT?
                            
                                Can Postgres stored functions have both a return value AND out parameters?
                            
                                Dynamically-generated table-name in PostgreSQL COPY command
                            
                                List all UNLOGGED tables in Postgresql database
                            
                                How can I detect if a Postgres temporary table already exists?
                            
                                Error with Rails test database using postgres using hstore
                            
                                Slick way to count NULL and non-NULL rows?
                            
                                how to collect multiple values as a single string in postgres?
                            
                                postgreSQL permission denied when reading from file with \i command
                            
                                Postgres concat all
                            
                                psql ERROR: relation already exists
                            
                                Functions with variable number of input parameters
                            
                                getting ids of multiple rows inserted in psycopg2
                            
                                Can I dry run/sandbox sql commands?
                            
                                PostgreSQL: Update function return boolean
                            
                                What is the equivalent of varchar type in PostgreSQL?
                            
                                Listing rows which have the same value in a certain column in postgresql
                            
                                JPA ERROR: relation does not exist

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Indexing jsonb data for pattern matching searches

Tags:

pattern-matching

indexing

postgresql

jsonb