Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Postgresql using filter instead of index?

I have my postgresql ad_item table with the following columns.

id | name | remaining | created_at

And then I have my index on remaining which are less than or equal to 300.

create index remaining_index on ad_item using btree(id) where remaining <= 300

But when I do explain analyze on my query.

explain analyze select id from ad_item where remaining <= 300

It is showing me the following.

Seq Scan on ad_item  (cost=0.00..2.06 rows=1 width=4) (actual time=0.010..0.013 rows=1 loops=1)
  Filter: (remaining <= '300'::numeric)
  Rows Removed by Filter: 4
Planning time: 0.115 ms
Execution time: 0.026 ms

Why is it not using my remaining_index? Is that index redundant?

Thanks.

like image 686
moeseth Avatar asked Dec 20 '16 08:12

moeseth


1 Answers

You can make a very apx calculations to get the idea when it starts using your index.

t=# drop table s07;
DROP TABLE
t=# create table s07 (i int, r int, t text);
CREATE TABLE
t=# insert into s07 select 1,1,'some text';
INSERT 0 1
t=# insert into s07 select 2,2,'some text';
INSERT 0 1
t=# insert into s07 select 3,3,'some text';
INSERT 0 1
t=# insert into s07 select 4,4,'some text';
INSERT 0 1
t=# create index s07i on s07 (i);
CREATE INDEX
t=# analyze s07;
ANALYZE
t=# SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 's07%';
 relname | relkind | reltuples | relpages
---------+---------+-----------+----------
 s07     | r       |         4 |        1
 s07i    | i       |         4 |        2
(2 rows)

despite the fact that index has same amount of rows and less columns, on small amount of data it actually takes twice more space! relation -1 page, index -2, so planner performs seq scan:

t=# explain analyze select i from s07 where i < 4;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Seq Scan on s07  (cost=0.00..1.05 rows=4 width=4) (actual time=0.003..0.004 rows=3 loops=1)
   Filter: (i < 4)
   Rows Removed by Filter: 1
 Planning time: 0.086 ms
 Execution time: 0.013 ms
(5 rows)

So after populating some data:

t=# insert into s07 select i,i,'some text' from generate_series(1,99,1) i;
INSERT 0 99
t=# analyze s07;
ANALYZE
t=# SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 's07%';
 relname | relkind | reltuples | relpages
---------+---------+-----------+----------
 s07     | r       |       103 |        1
 s07i    | i       |       103 |        2
(2 rows)

t=# explain analyze select i from s07 where i < 4;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Seq Scan on s07  (cost=0.00..2.29 rows=7 width=4) (actual time=0.008..0.016 rows=6 loops=1)
   Filter: (i < 4)
   Rows Removed by Filter: 97
 Planning time: 0.119 ms
 Execution time: 0.029 ms
(5 rows)

same picture. so putting even more data:

t=# insert into s07 select i,i,'some text' from generate_series(1,299,1) i;
INSERT 0 299
t=# analyze s07;
ANALYZE
t=# SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 's07%';
 relname | relkind | reltuples | relpages
---------+---------+-----------+----------
 s07     | r       |       402 |        3
 s07i    | i       |       402 |        2
(2 rows)

t=# explain analyze select i from s07 where i < 4;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on s07  (cost=4.22..7.33 rows=9 width=4) (actual time=0.005..0.007 rows=9 loops=1)
   Recheck Cond: (i < 4)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on s07i  (cost=0.00..4.21 rows=9 width=0) (actual time=0.002..0.002 rows=9 loops=1)
         Index Cond: (i < 4)
 Planning time: 0.099 ms
 Execution time: 0.017 ms
(7 rows)

With conditional index such calculation would be more sophisticated of course, but basically here you can see, that even 100 rows are cheaper to filter from one page, then loading two pages.

Now of course you can change this behavior with config, eg when we had 100 rows, if you run:

t=# set cpu_tuple_cost to 1;
SET
t=# set cpu_index_tuple_cost to 0.000001;
SET
t=# explain analyze select i from s07 where i < 4;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Index Only Scan using s07i on s07  (cost=0.14..15.16 rows=7 width=4) (actual time=0.012..0.014 rows=6 loops=1)
   Index Cond: (i < 4)
   Heap Fetches: 6
 Planning time: 0.058 ms
 Execution time: 0.028 ms
(5 rows)

though the number of pages is for Seq scan:

t=# SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 's07%';
 relname | relkind | reltuples | relpages
---------+---------+-----------+----------
 s07     | r       |       103 |        1
 s07i    | i       |       103 |        2
(2 rows)

And surely you can check execution plan and time with temporarily set enable_seqscan=off

Update

100 rows even mentioned in docs as too small amount for indes scan:

selecting 1 out of 100 rows will hardly be (a candidate for an index), because the 100 rows probably fit within a single disk page, and there is no plan that can beat sequentially fetching 1 disk page.

like image 61
Vao Tsun Avatar answered Jan 17 '23 23:01

Vao Tsun