Why isn't Postgres using the index with Distinct?

Question

I have this table:

CREATE TABLE public.prodhistory (
  curve_id           int4 NOT NULL,
  start_prod_date    date NOT NULL,
  prod_date          date NOT NULL,
  monthly_prod_rate  float4 NOT NULL,
  eff_date           timestamp NOT NULL,
  /* Keys */
  CONSTRAINT prodhistorypk
    PRIMARY KEY (curve_id, prod_date, start_prod_date, eff_date),
  /* Foreign keys */
  CONSTRAINT prodhistory2typecurves_fk
    FOREIGN KEY (curve_id)
    REFERENCES public.typecurves(curve_id)
) WITH (
    OIDS = FALSE
  );

CREATE INDEX prodhistory_idx_curve_id01
  ON public.prodhistory
  (curve_id);

with ~42M rows.

And I execute this query:

SELECT DISTINCT curve_id FROM prodhistory

Which I expect would be very quick, given the index. But no, 270 secs. So I explain, and I get:

HashAggregate  (cost=824870.03..824873.08 rows=305 width=4) (actual time=211834.018..211834.097 rows=315 loops=1)   
  Output: curve_id  
  Group Key: prodhistory.curve_id   
  ->  Seq Scan on public.prodhistory  (cost=0.00..718003.22 rows=42746722 width=4) (actual time=12.751..200826.299 rows=43218808 loops=1)   
        Output: curve_id    
Planning time: 0.115 ms 
Execution time: 211848.137 ms

I'm not to experienced in reading these plans, but a Seq Scan on the DB seems bad.

Any thoughts? I'm sort of stumped.

Laurenz Albe · Accepted Answer

This plan is chosen because PostgreSQL thinks it is cheaper.

You can compare by setting

SET enable_seqscan=off;

and then re-running your EXPLAIN (ANALYZE) statement. Compare cost and actual time in both cases and check if PostgreSQL estimated correctly or not.

If you find that using an Index Scan or Index Only Scan is actually cheaper, you could consider twiddling the cost parameters to match your machine better, e.g. lower random_page_cost or cpu_index_tuple_cost or raise cpu_tuple_cost.

Why isn't Postgres using the index with Distinct?

Tags:

sql

postgresql

Marc

1 Answers

Laurenz Albe

Recent Activity

Donate For Us

Why isn't Postgres using the index with Distinct?

Tags:

sql

postgresql

Marc

1 Answers

Laurenz Albe

Related questions

Recent Activity

Donate For Us