Why this query is not using index only scan in postgresql

Tags:

postgresql

I have a table with 16 columns, in which there are a primary key and a column to store values. I want to select all the values in a certain range. The value column (easyid) has been indexed.

create table tb1 (
    id Int primary key,
    easyid Int,
    .....
)
create index i_easyid on tb1 (easyid)

Other info: postgresql 9.4, no auto vacuum. The sql is like this.

select "easyid" from "tb1" where "easyid" between 12183318 and 82283318

Theoretically postgresql should use index only scan on i_easyid. It only do index only scan when the range "easyid" between A and B is small. When the range is large, namely B-A is a pretty big number, postgresql uses bitmap index scan on i_easyid and then bit heap scan on tb1.

I was wrong to say index scan only or not depends on the range size. I tried the same query with different parameters, sometimes it is index scan only sometimes it is not.

The table tb1 is very large up to 17G. i_easyid is 600MB.

Here is the explain of sql. And I don't understand why 4000 rows can cost more than 10 seconds.

sample_pg=# explain analyze select easyid from tb1 where "easyid" between 152183318 and 152283318;
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tb1  (cost=97.70..17227.71 rows=4416 width=4) (actual time=1.155..14346.311 rows=5004 loops=1)
   Recheck Cond: ((easyid >= 152183318) AND (easyid <= 152283318))
   Heap Blocks: exact=4995
   ->  Bitmap Index Scan on i_easyid  (cost=0.00..96.60 rows=4416 width=0) (actual time=0.586..0.586 rows=5004 loops=1)
         Index Cond: ((easyid >= 152183318) AND (easyid <= 152283318))
 Planning time: 0.080 ms
 Execution time: 14348.037 ms
(7 rows)

Here is an example of index only scan:

sample_pg=# explain analyze verbose select easyid from tb1 where "easyid" between 32280318 and 32283318;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i_easyid on public.tb1  (cost=0.44..281.82 rows=69 width=4) (actual time=14.585..160.624 rows=33 loops=1)
   Output: easyid
   Index Cond: ((tb1.easyid >= 32280318) AND (tb1.easyid <= 32283318))
   Heap Fetches: 33
 Planning time: 0.085 ms
 Execution time: 160.654 ms
(6 rows)

265

asked Apr 06 '15 08:04

worldterminator

2 Answers

I'm not 100% sure, but I suspect that PostgreSQL believes that it is going to be be faster to read the table than the index, because of the random_page_cost. The index read is potentially higher cost because of the need to find essentially random pages in it.

The data retrieved from the table is going to need sorting, but the calculations probably suggest that the total cost of (sequential table read + sort) is greater than (random index reads).

This is partially testable by changing the value of random_page_cost, which would be worth investigating if you're using very fast disks or an SSD anyway.

122

answered Oct 28 '22 03:10

David Aldridge

autovacuum is not running

PostgreSQL index-only scans require some information about which rows are "visible" to current transactions - i.e. not deleted, not old versions of updated rows, and not uncommitted inserts or new versions of updates.

This information is kept in the "visibility map".

The visibility map is maintained by VACUUM, usually in the background by autovacuum workers.

If autovacuum is not keeping up with write activity well, or if autovacuum has been disabled, then index-only scans probably won't be used because PostgreSQL will see that the visibility map does not have data for enough of the table.

Turn autovaccum back on. Then manually VACUUM the table to get it up to date immediately.

BTW, in addition to visibility map information, autoVACUUM can also write hint-bit information that can make SELECTs of recently inserted/updated data faster.

Autovacuum also maintains table statistics that are vital for effective query planning. Turning it off will result in the planner using increasingly stale information.

It is also absolutely vital for preventing an issue called transaction-ID wrap-around, which is an emergency condition that can cause the whole database to go into emergency shut-down until a time-consuming whole-table VACUUM is performed.

Do not turn autovacuum off.

As for why it's sometimes using an index-only scan and sometimes not, a few possibilities:

The current random_page_cost setting makes it think that random I/O will be slower than it really is, so it tries harder to avoid it;
The table statistics, especially the limit values, are outdated. So it doesn't realise that there's a good chance the value being looked for will be discovered quickly in an index-only scan;
The visibility map is outdated, so it thinks an index-only scan will find too many values that will require heap fetches to check, making it slower than other methods especially if it thinks the proportion of values likely to be found is high.

Most of these issues are fixed by leaving autovacuum alone. In fact, on frequently appended tables you should set autovacuum to run much more often than the default so it updates the limit statistics more. (Doing that helps work around PostgreSQL's planner issues with tables where the most frequently queried data is the most recently inserted with an incrementing ID or timestamp that means the most-desired values are never in the table histograms and limit stats).

Go turn autovacuum back on - then turn it up.

answered Oct 28 '22 04:10

Craig Ringer

Related questions
                            
                                How can a SQL query have two from clauses?
                            
                                Printing Django QuerySet SQL with ""
                            
                                Index not used when LIMIT is used in postgres
                            
                                Delete statement was very slow in Oracle
                            
                                In SQL, What's the difference a ON condition following a Join vs at the end of multiple JOINS
                            
                                Column 'mary' does not exist
                            
                                SqlAlchemy: filter to match all instead of any values in list?
                            
                                how to limit and order by in MS SQL?
                            
                                Update existing database values from spreadsheet
                            
                                SELECT CONVERT(VARCHAR(10), GETDATE(), 110) what is the meaning of 110 here?
                            
                                How to call Stored Procedures (with 2 parameters) in a Stored Procedure?
                            
                                Difference between .SQL and .DUMP files
                            
                                Unknown column in mysql subquery
                            
                                SQLite; "Cannot add a PRIMARY KEY column"-Exception
                            
                                Hive describe partitions to show partition url
                            
                                How to order sql results starting with a certain string
                            
                                How to use FOR XML PATH('') in a query without escaping special characters?
                            
                                MySQL LOAD_FILE returns NULL
                            
                                What are IBinarySerialize Interface methods used for?
                            
                                Trying to understand "except all" in sql query

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With