Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL query doesn't use index

I have a very simple db schema, which has a multi column b-tree index on following columns:

PersonId, Amount, Commission

Now, if I try to select the table with following query:

explain select * from "Order" where "PersonId" = 2 AND "Commission" > 3

Pg is scanning the index and the query is very fast, but if I try the following query:

explain select * from "Order" where "PersonId" > 2 AND "Commission" > 3

It does a sequential scan, even when the index is present. Even this query

explain select * from "Order" where "Commission" > 3 

does a sequential scan. Anyone care to explain why? :-)

Thank you very much.

UPDATE

The table contains 100 million rows. I have created it just to test PostgreSQL performance against MS SQL. The table is already VACUUMED. I'm runnning Core I5 2500k quad core cpu and 8 GB of ram.

Here's the result of explain analyze for this query:

explain ANALYZE select * from "Order" where "Commission" BETWEEN 3000000 AND 3000010  LIMIT 20 


Limit  (cost=0.00..2218328.00 rows=1 width=24) (actual time=28043.249..28043.249 rows=0 loops=1)
  ->  Seq Scan on "Order"  (cost=0.00..2218328.00 rows=1 width=24) (actual time=28043.247..28043.247 rows=0 loops=1)
        Filter: (("Commission" >= 3000000::numeric) AND ("Commission" <= 3000010::numeric))
Total runtime: 28043.278 ms
like image 351
Davita Avatar asked May 13 '12 19:05

Davita


1 Answers

The short answer is that when comparing the various available plans, the sequential scan is expected to be the fastest, based on the costing factors you have configured and the latest statistics available. From what little information you've provided, it seems quite likely that the planner has made the right choice. If you had three single-column indexes, it might be able to use bitmap index scans, particularly if the rows to be selected are less than about 10% of the rows in the table.

Note that with the index you describe, the entire index would need to be scanned from for all rows where "PersonId" > 2; which unless you have a lot of negative values for "PersonId" is very likely to be most of the table.

Also note that if you have a tiny table -- say a few thousand rows or less, accessing the rows through an index will rarely be faster than just scanning those few rows. Plans are sensitive to data volume, and the plan you get with a small number of rows is very unlikely to be the same plan you get with a lot of rows.

If it is, in fact, not picking the fastest plan, the odds are good that you need to adjust your cost factors to better model the costs on your machine. Another possibility is that you need to be more aggressive in your autovacuum settings, to make sure up-to-date statistics are available, or you may need to configure collection of finer-grained statistics.

People will be able to provide more specific advice if you show the table descriptions (including indexes), the EXPLAIN ANALYZE output for the query, and a description of the hardware.

like image 162
kgrittn Avatar answered Nov 15 '22 08:11

kgrittn