Below is my query. I am trying to get it to use an index scan, but it will only seq scan.
By the way the metric_data
table has 130 million rows. The metrics
table has about 2000 rows.
metric_data
table columns:
metric_id integer , t timestamp , d double precision , PRIMARY KEY (metric_id, t)
How can I get this query to use my PRIMARY KEY index?
SELECT S.metric, D.t, D.d FROM metric_data D INNER JOIN metrics S ON S.id = D.metric_id WHERE S.NAME = ANY (ARRAY ['cpu', 'mem']) AND D.t BETWEEN '2012-02-05 00:00:00'::TIMESTAMP AND '2012-05-05 00:00:00'::TIMESTAMP;
EXPLAIN:
Hash Join (cost=271.30..3866384.25 rows=294973 width=25) Hash Cond: (d.metric_id = s.id) -> Seq Scan on metric_data d (cost=0.00..3753150.28 rows=29336784 width=20) Filter: ((t >= '2012-02-05 00:00:00'::timestamp without time zone) AND (t <= '2012-05-05 00:00:00'::timestamp without time zone)) -> Hash (cost=270.44..270.44 rows=68 width=13) -> Seq Scan on metrics s (cost=0.00..270.44 rows=68 width=13) Filter: ((sym)::text = ANY ('{cpu,mem}'::text[]))
To prevent Postgres using an index only scan we select more columns than the index contains. The index scan is much faster. The next step is to look at the the query by changing the condition to match all rows. An index scan is still faster, but the percentage of the difference is far smaller.
Forcing a specific index to the SQL query is not possible in the current release of PostgreSQL; however, you can somehow guide the planner to pick the index scan over the other bitmap and sequential scans by disabling the session level optimizer parameters.
The two main reasons. There are two main reasons that Postgres will not use an index. Either it can't use the index, or it doesn't think using the index will be faster.
For testing purposes you can force the use of the index by "disabling" sequential scans - best in your current session only:
SET enable_seqscan = OFF;
Do not use this on a productive server. Details in the manual here.
I quoted "disabling", because you cannot actually disable sequential table scans. But any other available option is now preferable for Postgres. This will prove that the multicolumn index on (metric_id, t)
can be used - just not as effective as an index on the leading column.
You probably get better results by switching the order of columns in your PRIMARY KEY
(and the index used to implement it behind the curtains with it) to (t, metric_id)
. Or create an additional index with reversed columns like that.
You do not normally have to force better query plans by manual intervention. If setting enable_seqscan = OFF
leads to a much better plan, something is probably not right in your database. Consider this related answer:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With