[PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.2.0-10) 6.2.0 20161027, 64-bit]
I have a table with timestamp ranges:
create table testing.test as
select tsrange(d, null) ts from
generate_series(timestamp '2000-01-01', timestamp '2018-01-01', interval '1 minute') s(d);
I need to run the following query:
select *
from testing.test
where lower(ts)> '2017-06-17 20:00:00'::timestamp and upper_inf(ts)
Explain analyze result for table without indexes:
Seq Scan on test (cost=0.00..72482.26 rows=1052013 width=14) (actual time=2165.477..2239.781 rows=283920 loops=1)
Filter: (upper_inf(ts) AND (lower(ts) > '2017-06-17 20:00:00'::timestamp without time zone))
Rows Removed by Filter: 9184081
Planning time: 0.046 ms
Execution time: 2250.221 ms
Next I'm going to add a following partial index:
create index lower_rt_inf ON testing.test using btree(lower(ts)) where upper_inf(ts);
analyze testing.test;
Explain analyze result for table with partial index:
Index Scan using lower_rt_inf on test (cost=0.04..10939.03 rows=1051995 width=14) (actual time=0.037..52.083 rows=283920 loops=1)
Index Cond: (lower(ts) > '2017-06-17 20:00:00'::timestamp without time zone)
Planning time: 0.156 ms
Execution time: 62.900 ms
And:
SELECT null_frac, n_distinct, correlation FROM pg_catalog.pg_stats WHERE tablename = 'lower_rt_inf'
null_frac |n_distinct |correlation |
----------|-----------|------------|
0 |-1 |1 |
Then I create an index similar to the previous one, but without partial condition:
create index lower_rt_full ON testing.test using btree(lower(ts));
analyze testing.test;
And now the same index is used, but the cost/rows are different:
Index Scan using lower_rt_inf on test (cost=0.04..1053.87 rows=101256 width=14) (actual time=0.029..58.613 rows=283920 loops=1)
Index Cond: (lower(ts) > '2017-06-17 20:00:00'::timestamp without time zone)
Planning time: 0.280 ms
Execution time: 71.794 ms
And a bit more:
select * from testing.test where lower(ts)> '2017-06-17 20:00:00'::timestamp;
Index Scan using lower_rt_full on test (cost=0.04..3159.52 rows=303767 width=14) (actual time=0.036..64.208 rows=283920 loops=1)
Index Cond: (lower(ts) > '2017-06-17 20:00:00'::timestamp without time zone)
Planning time: 0.099 ms
Execution time: 78.759 ms
How can I effectively use partial indexes for expressions?
What happens here is that the statistics on index lower_rt_full are used to estimate the row count, but statistics on lower_rt_inf, which is a partial index, aren't.
Since function are a black box for PostgreSQL, it has no idea about the distribution of lower(ts) and uses a bad estimate.
After lower_rt_full has been created and the table analyzed, PostgreSQL has a good idea about this distribution and can estimate much better. Even if the index isn't used to execute the query, it is used for query planning.
Since upper_inf is also a function (black box), you would get an even better estimate if you had an index ON test (upper_inf(ts), lower(ts)).
For an explanation why partial indexes are not considered to estimate the number of result rows, see this comment in examine_variable in backend/utils/adt/selfuncs.c, which tries to find statistical data about an expression:
* Has it got stats? We only consider stats for * non-partial indexes, since partial indexes probably * don't reflect whole-relation statistics; the above * check for uniqueness is the only info we take from * a partial index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With