Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does PostgreSQL perform sequential scan on indexed column?

People also ask

Why is Postgres doing a sequential scan?

A lot of the times statistics are not updated on a table and it may not be possible to do so due to constraints. In this case, the optimizer will not know how many rows it should take in year > 2019. Thus it selects a sequential scan in lieu of full knowledge.

How do I stop index scan in PostgreSQL?

For testing purposes you can force the use of the index by "disabling" sequential scans - best in your current session only: SET enable_seqscan = OFF; Do not use this on a productive server.

Which is faster index scan or sequential scan?

If you need only a single table row, an index scan is much faster than a sequential scan. If you need the whole table, a sequential scan is faster than an index scan. Somewhere between that is the turning point where PostgreSQL switches between these two access methods.

Does order of columns in index matter PostgreSQL?

The order of columns doesn't matter in creating tables in PostgreSQL, but it does matter sometimes in creating indexes in PostgreSQL.


If the SELECT returns more than approximately 5-10% of all rows in the table, a sequential scan is much faster than an index scan.

This is because an index scan requires several IO operations for each row (look up the row in the index, then retrieve the row from the heap). Whereas a sequential scan only requires a single IO for each row - or even less because a block (page) on the disk contains more than one row, so more than one row can be fetched with a single IO operation.

Btw: this is true for other DBMS as well - some optimizations as "index only scans" taken aside (but for a SELECT * it's highly unlikely such a DBMS would go for an "index only scan")


Did you ANALYZE the table/database? And what about the statistics? When there are many records where year > 2009, a sequential scan might be faster than an index scan.


@a_horse_with_no_name explained it quite well. Also if you really want to use an index scan, you should generally use bounded ranges in where clause. eg - year > 2019 and year < 2020.

A lot of the times statistics are not updated on a table and it may not be possible to do so due to constraints. In this case, the optimizer will not know how many rows it should take in year > 2019. Thus it selects a sequential scan in lieu of full knowledge. Bounded partitions will solve the problem most of the time.


In index scan, read head jumps from one row to another which is 1000 times slower than reading the next physical block (in the sequential scan).

So, if the (number of records to be retrieved * 1000) is less than the total number of records, the index scan will perform better.