Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Index not used when LIMIT is used in postgres

I have a words table with an index on (language_id, state). Here are the results for EXPLAIN ANALYZE:

No limit

explain analyze SELECT "words".* FROM "words" WHERE (words.language_id = 27) AND (state IS NULL);

Bitmap Heap Scan on words  (cost=10800.38..134324.10 rows=441257 width=96) (actual time=233.257..416.026 rows=540556 loops=1)
Recheck Cond: ((language_id = 27) AND (state IS NULL))
->  Bitmap Index Scan on ls  (cost=0.00..10690.07 rows=441257 width=0) (actual time=230.849..230.849 rows=540556 loops=1)
Index Cond: ((language_id = 27) AND (state IS NULL))
Total runtime: 460.277 ms
(5 rows)

Limit 100

explain analyze SELECT "words".* FROM "words" WHERE (words.language_id = 27) AND (state IS NULL) LIMIT 100;

Limit  (cost=0.00..51.66 rows=100 width=96) (actual time=0.081..0.184 rows=100 loops=1)
->  Seq Scan on words  (cost=0.00..227935.59 rows=441257 width=96) (actual time=0.080..0.160 rows=100 loops=1)
Filter: ((state IS NULL) AND (language_id = 27))
Total runtime: 0.240 ms
(4 rows)

Why is this happening? How can I get the index to be used in all cases?

Thanks.

like image 468
alste Avatar asked Dec 19 '11 20:12

alste


People also ask

Why is my index not being used Postgres?

As such, if Postgres chooses not to use your index, it is likely that it is calculating the cost of that query plan to be higher than the one it has chosen.

Can we use limit in PostgreSQL?

The PostgreSQL LIMIT clause is used to get a subset of rows generated by a query. It is an optional clause of the SELECT statement. The LIMIT clause can be used with the OFFSET clause to skip a specific number of rows before returning the query for the LIMIT clause.

How does PostgreSQL limit work?

The limit is an optional clause of the PostgreSQL select statement used to fetch a limited number of rows from the whole table. This clause is also used with an offset clause to fetch records from the table. We can use this clause with an order by clause to find ascending and descending numbers.


2 Answers

I think that the PostreSQL query planner just thinks that in the second case - the one with the LIMIT - it's not worth applying the index as it [the LIMIT] is too small. So it's not an issue.

like image 139
Marek Příhoda Avatar answered Oct 24 '22 18:10

Marek Příhoda


Take a look at the PostgreSQL documentation about Using EXPLAIN and Query Planning. The reason for the query planner to prefer a sequential scan over an index scan in the LIMIT 100 case is simply because the sequential scan is cheaper.

There is no ORDER BY clause in the query, so the planner is ok with the first 100 (random) rows that match the filter condition. An index scan would require to read the index pages first and then read the data pages to fetch the according rows. The sequential scan only needs to read the data pages to fetch the rows. In your case table statistics seem to suggest that there are enough (random) rows that match the filter condition. The cost of sequential page reads to get the 100 rows is considered cheaper than the cost of reading the index first and then fetch the actual rows. You might see a different plan when you raise the limit or when less rows match the filter condition.

With the default settings the planner considers the cost of a random page read (random_page_cost) four times the cost of a sequential page read (seq_page_cost). These settings can be adjusted to tune query plans (e.g. when the whole database is in RAM a random page read is not more expensive than a sequential page read and an index scan should be preferred). You can also try out different query plans by enabling/disabling certain kinds of scans, e.g:

set enable_seqscan = [on | off]
set enable_indexscan = [on | off]

While it is possible to enable/disable certain kinds of scans on a global basis this should be only used ad hoc for debugging or troubleshooting on a per session basis.

Also run VACUUM ANALYZE words before you test the query plans, otherwise an automatic vacuum (autovaccum) run between the tests might influence the results.

like image 21
tscho Avatar answered Oct 24 '22 17:10

tscho