I have this table:
CREATE TABLE public.prodhistory (
curve_id int4 NOT NULL,
start_prod_date date NOT NULL,
prod_date date NOT NULL,
monthly_prod_rate float4 NOT NULL,
eff_date timestamp NOT NULL,
/* Keys */
CONSTRAINT prodhistorypk
PRIMARY KEY (curve_id, prod_date, start_prod_date, eff_date),
/* Foreign keys */
CONSTRAINT prodhistory2typecurves_fk
FOREIGN KEY (curve_id)
REFERENCES public.typecurves(curve_id)
) WITH (
OIDS = FALSE
);
CREATE INDEX prodhistory_idx_curve_id01
ON public.prodhistory
(curve_id);
with ~42M rows.
And I execute this query:
SELECT DISTINCT curve_id FROM prodhistory
Which I expect would be very quick, given the index. But no, 270 secs. So I explain, and I get:
HashAggregate (cost=824870.03..824873.08 rows=305 width=4) (actual time=211834.018..211834.097 rows=315 loops=1)
Output: curve_id
Group Key: prodhistory.curve_id
-> Seq Scan on public.prodhistory (cost=0.00..718003.22 rows=42746722 width=4) (actual time=12.751..200826.299 rows=43218808 loops=1)
Output: curve_id
Planning time: 0.115 ms
Execution time: 211848.137 ms
I'm not to experienced in reading these plans, but a Seq Scan on the DB seems bad.
Any thoughts? I'm sort of stumped.
This plan is chosen because PostgreSQL thinks it is cheaper.
You can compare by setting
SET enable_seqscan=off;
and then re-running your EXPLAIN (ANALYZE)
statement. Compare cost
and actual time
in both cases and check if PostgreSQL estimated correctly or not.
If you find that using an Index Scan
or Index Only Scan
is actually cheaper, you could consider twiddling the cost parameters to match your machine better, e.g. lower random_page_cost
or cpu_index_tuple_cost
or raise cpu_tuple_cost
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With