Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the proper index for querying structures in arrays in Postgres jsonb?

I'm experimenting with keeping values like the following in a Postgres jsonb field in Postgres 9.4:

[{"event_slug":"test_1","start_time":"2014-10-08","end_time":"2014-10-12"},
 {"event_slug":"test_2","start_time":"2013-06-24","end_time":"2013-07-02"},
 {"event_slug":"test_3","start_time":"2014-03-26","end_time":"2014-03-30"}]

I'm executing queries like:

SELECT * FROM locations
WHERE EXISTS (
  SELECT 1 FROM jsonb_array_elements(events) AS e
  WHERE (
    e->>'event_slug' = 'test_1' AND
    (
      e->>'start_time' >= '2014-10-30 14:04:06 -0400' OR
      e->>'end_time' >= '2014-10-30 14:04:06 -0400'
    )
  )
)

How would I create an index on that data for queries like the above to utilize? Does this sound reasonable design for a few million rows that each contain ~10 events in that column?

Worth noting that it seems I'm still getting sequential scans with:

CREATE INDEX events_gin_idx ON some_table USING GIN (events);

which I'm guessing is because the first thing I'm doing in the query is converting data to json array elements.

like image 432
Tony Avatar asked Oct 22 '14 02:10

Tony


People also ask

What index type is typically used for Jsonb in PostgreSQL?

When we use ->> operator of JSONB, PostgreSQL can use B-tree or Hash index for processing the operations. ->> operator returns the value of the specified attribute in text format. PostgreSQL can use indexes for the text results as compare operands. GIN index can be used by the GIN JSONB operator class.

How do I query Jsonb data in PostgreSQL?

Querying the JSON documentPostgreSQL has two native operators -> and ->> to query JSON documents. The first operator -> returns a JSON object, while the operator ->> returns text. These operators work on both JSON as well as JSONB columns. There are additional operators available for JSONB columns.

Which index is best in PostgreSQL?

B-tree indexes B-tree is the default index in Postgres and is best used for specific value searches, scanning ranges, data sorting or pattern matching.

What is the most common index type used in PostgreSQL?

The most common and widely used index type is the B-tree index. This is the default index type for the CREATE INDEX command, unless you explicitly mention the type during index creation.


1 Answers

First of all, you cannot access JSON array values like that. For a given json value:

[{"event_slug":"test_1","start_time":"2014-10-08","end_time":"2014-10-12"},
 {"event_slug":"test_2","start_time":"2013-06-24","end_time":"2013-07-02"},
 {"event_slug":"test_3","start_time":"2014-03-26","end_time":"2014-03-30"}]

A valid test against the first array element would be:

WHERE e->0->>'event_slug' = 'test_1'

But you probably don't want to limit your search to the first element of the array. With the jsonb data type in Postgres 9.4 you have additional operators and index support. To index elements of an array you need a GIN index.

The built-in operator classes for GIN indexes do not support "greater than" or "less than" operators > >= < <=. This is true for jsonb as well, where you can choose between two operator classes. The manual:

Name             Indexed Data Type  Indexable Operators
...
jsonb_ops        jsonb              ? ?& ?| @>
jsonb_path_ops   jsonb              @>

(jsonb_ops being the default.) You can cover the equality test, but neither of those operators covers your requirement for >= comparison. You would need a btree index.

Basic solution

To support the equality check with an index:

CREATE INDEX locations_events_gin_idx ON locations
USING gin (events jsonb_path_ops);

SELECT * FROM locations WHERE events @> '[{"event_slug":"test_1"}]';

This might be good enough if the filter is selective enough.
Assuming end_time >= start_time, so we don't need two checks. Checking only end_time is cheaper and equivalent:

SELECT l.*
FROM   locations l
     , jsonb_array_elements(l.events) e
WHERE  l.events @> '[{"event_slug":"test_1"}]'
AND   (e->>'end_time')::timestamp >= '2014-10-30 14:04:06 -0400'::timestamptz;

Utilizing an implicit JOIN LATERAL. Details (last chapter):

  • PostgreSQL unnest() with element number

Careful with the different data types! What you have in the JSON value looks like timestamp [without time zone], while your predicates use timestamp with time zone literals. The timestamp value is interpreted according to the current time zone setting, while the given timestamptz literals must be cast to timestamptz explicitly or the time zone would be ignored! Above query should work as desired. Detailed explanation:

  • Ignoring time zones altogether in Rails and PostgreSQL

More explanation for jsonb_array_elements():

  • PostgreSQL joining using JSONB

Advanced solution

If the above is not good enough, I would consider a MATERIALIZED VIEW that stores relevant attributes in normalized form. This allows plain btree indexes.

The code assumes that your JSON values have a consistent format as displayed in the question.

Setup:

CREATE TYPE event_type AS (
 , event_slug  text
 , start_time  timestamp
 , end_time    timestamp
);

CREATE MATERIALIZED VIEW loc_event AS
SELECT l.location_id, e.event_slug, e.end_time  -- start_time not needed
FROM   locations l, jsonb_populate_recordset(null::event_type, l.events) e;

Related answer for jsonb_populate_recordset():

  • How to convert PostgreSQL 9.4's jsonb type to float
CREATE INDEX loc_event_idx ON loc_event (event_slug, end_time, location_id);

Also including location_id to allow index-only scans. (See manual page and Postgres Wiki.)

Query:

SELECT *
FROM   loc_event
WHERE  event_slug = 'test_1'
AND    end_time  >= '2014-10-30 14:04:06 -0400'::timestamptz;

Or, if you need full rows from the underlying locations table:

SELECT l.*
FROM  (
   SELECT DISTINCT location_id
   FROM   loc_event
   WHERE  event_slug = 'test_1'
   AND    end_time  >= '2014-10-30 14:04:06 -0400'::timestamptz
   ) le
JOIN locations l USING (location_id);
like image 98
Erwin Brandstetter Avatar answered Oct 13 '22 12:10

Erwin Brandstetter