Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres not using partial timestamp index on interval queries (e.g., now() - interval '7 days' )

I have a simple table that store precipitation readings from online gauges. Here's the table definition:

    CREATE TABLE public.precip
    (
        gauge_id smallint,
        inches numeric(8, 2),
        reading_time timestamp with time zone
    )

    CREATE INDEX idx_precip3_id
        ON public.precip USING btree
        (gauge_id)

    CREATE INDEX idx_precip3_reading_time
        ON public.precip USING btree
        (reading_time)

CREATE INDEX idx_precip_last_five_days
    ON public.precip USING btree
    (reading_time)
    TABLESPACE pg_default    WHERE reading_time > '2017-02-26 00:00:00+00'::timestamp with time zone

It's grown quite large: about 38 million records that go back 18 months. Queries rarely request rows that are more than 7 days old and I created the partial index on the reading_time field so Postgres can traverse a much smaller index. But it's not using the partial index on all queries. It does use the partial index on

explain analyze select * from precip where gauge_id = 208 and reading_time > '2017-02-27' 
            Bitmap Heap Scan on precip  (cost=8371.94..12864.51 rows=1169 width=16) (actual time=82.216..162.127 rows=2046 loops=1)   
            Recheck Cond: ((gauge_id = 208) AND (reading_time > '2017-02-27 00:00:00+00'::timestamp with time zone))
           ->  BitmapAnd  (cost=8371.94..8371.94 rows=1169 width=0) (actual time=82.183..82.183 rows=0 loops=1)
                ->  Bitmap Index Scan on idx_precip3_id  (cost=0.00..2235.98 rows=119922 width=0) (actual time=20.754..20.754 rows=125601 loops=1)
                      Index Cond: (gauge_id = 208)
                ->  Bitmap Index Scan on idx_precip_last_five_days  (cost=0.00..6135.13 rows=331560 width=0) (actual time=60.099..60.099 rows=520867 loops=1) 
    Total runtime: 162.631 ms

But it does not use the partial index on the following. Instead, it's use the full index on reading_time

 explain analyze select * from precip where gauge_id = 208 and reading_time > now() - interval '7 days' 

Bitmap Heap Scan on precip  (cost=8460.10..13007.47 rows=1182 width=16) (actual time=154.286..228.752 rows=2067 loops=1)
   Recheck Cond: ((gauge_id = 208) AND (reading_time > (now() - '7 days'::interval)))
      ->  BitmapAnd  (cost=8460.10..8460.10 rows=1182 width=0) (actual time=153.799..153.799 rows=0 loops=1)
              ->  Bitmap Index Scan on idx_precip3_id  (cost=0.00..2235.98 rows=119922 width=0) (actual time=15.852..15.852 rows=125601 loops=1)
                   Index Cond: (gauge_id = 208)
        ->  Bitmap Index Scan on idx_precip3_reading_time  (cost=0.00..6223.28 rows=335295 width=0) (actual time=136.162..136.162 rows=522993 loops=1)
              Index Cond: (reading_time > (now() - '7 days'::interval))
Total runtime: 228.647 ms

Note that today is 3/5/2017, so these two queries are essentially requesting the rows. But it seems like Postgres won't use the partial index unless the timestamp in the where clause is "hard coded". Is the query planner not evaluating now() - interval '7 days' before deciding which index to use? I posted the query plans as suggested by one of the first people to respond.
I've written several functions (stored procedures) that summarize rain fall in the last 6 hours, 12 hours .... 72 hours. They all use the interval approach in the query (e.g., reading_time > now() - interval '7 days'). I don't want to move this code into the application to send Postgres the hard coded timestamp. That would create a lot of messy php code that shouldn't be necessary.

Suggestions on how to encourage Postgres to use the partial index instead? My plan is to redefine the date range on the partial index nightly (drop index --> create index), but that seems a bit silly if Postgres isn't going to use it.

Thanks,

Alex

like image 383
Debaser Avatar asked Oct 30 '25 01:10

Debaser


1 Answers

Generally speaking, an index can be used, when the indexed column(s) is/are compared to constants (literal values), function calls, which are marked at least STABLE (which means that within a single statement, multiple calls of the functions -- with same parameters -- will produce the same results), and combination of those.

now() (which is an alias of current_timestamp) is marked as STABLE and timestamp_mi_interval() (which is the back-up function for the operator <timestamp> - <interval>) is marked as IMMUTABLE, which is even stricter than STABLE (now(), current_timestamp and transaction_timestamp marks the start of the transaction, statement_timestamp() marks the start of the statement -- still STABLE -- but clock_timestamp() gives the timestamp as seen on a clock, thus it is VOLATILE).

So in theory, the WHERE reading_time > now() - interval '7 days' should be able to use an index on the reading_time column. And it really does. But, since you defined a partial index, the planner needs to prove the following:

However, keep in mind that the predicate must match the conditions used in the queries that are supposed to benefit from the index. To be precise, a partial index can be used in a query only if the system can recognize that the WHERE condition of the query mathematically implies the predicate of the index. PostgreSQL does not have a sophisticated theorem prover that can recognize mathematically equivalent expressions that are written in different forms. (Not only is such a general theorem prover extremely difficult to create, it would probably be too slow to be of any real use.) The system can recognize simple inequality implications, for example "x < 1" implies "x < 2"; otherwise the predicate condition must exactly match part of the query's WHERE condition or the index will not be recognized as usable. Matching takes place at query planning time, not at run time.

And that is what is happening with your query, which has and reading_time > now() - interval '7 days'. By the time now() - interval '7 days' is evaluated, the planning already happened. And PostgreSQL couldn't prove that the predicate (reading_time > '2017-02-26 00:00:00+00') will be true. But when you used reading_time > '2017-02-27' it could prove that.

You could "guide" the planner with constant values, like this:

where gauge_id = 208
and   reading_time > '2017-02-26 00:00:00+00'
and   reading_time > now() - interval '7 days'

This way the planner realizes, that it can use the partial index, because indexed_col > index_condition and indexed_col > something_else implies that indexed_col will larger than (at least) index_condition. Maybe it will be larger than something_else too, but it doesn't matter for using the index.

I'm not sure if that is the answer you were looking for though. IMHO, if you have a really large amount of data (and PostgreSQL 9.5+) a single BRIN index might suit your needs better.

like image 102
pozs Avatar answered Oct 31 '25 17:10

pozs