Why is Postgres not using index on a simple GROUP BY?

Tags:

I have created a 36M rows table with an index on type column:

CREATE TABLE items AS
  SELECT
    (random()*36000000)::integer AS id,
    (random()*10000)::integer AS type,
    md5(random()::text) AS s
  FROM
    generate_series(1,36000000);
CREATE INDEX items_type_idx ON items USING btree ("type");

I run this simple query and expect postgresql to use my index:

explain select count(*) from "items" group by "type";

But the query planner decides to use Seq Scan instead:

HashAggregate  (cost=734592.00..734627.90 rows=3590 width=12) (actual time=6477.913..6478.344 rows=3601 loops=1)
  Group Key: type
  ->  Seq Scan on items  (cost=0.00..554593.00 rows=35999800 width=4) (actual time=0.044..1820.522 rows=36000000 loops=1)
Planning time: 0.107 ms
Execution time: 6478.525 ms

Time without EXPLAIN: 5s 979ms

I have tried several solutions from here and here:

Run VACUUM ANALYZE or VACUUM ANALYZE
Configure default_statistics_target, random_page_cost, work_mem

but nothing helps apart from setting enable_seqscan = OFF:

SET enable_seqscan = OFF;
explain select count(*) from "items" group by "type";

GroupAggregate  (cost=0.56..1114880.46 rows=3590 width=12) (actual time=5.637..5256.406 rows=3601 loops=1)
  Group Key: type
  ->  Index Only Scan using items_type_idx on items  (cost=0.56..934845.56 rows=35999800 width=4) (actual time=0.074..2783.896 rows=36000000 loops=1)
        Heap Fetches: 0
Planning time: 0.103 ms
Execution time: 5256.667 ms

Time without EXPLAIN: 659ms

Query with index scan is about 10x faster on my machine.

Is there a better solution than setting enable_seqscan?

UPD1

My postgresql version is 9.6.3, work_mem = 4MB (tried 64MB), random_page_cost = 4 (tried 1.1), max_parallel_workers_per_gather = 0 (tried 4).

UPD2

I have tried to fill type column not with random numbers, but with i / 10000 to make pg_stats.correlation = 1 - still seqscan.

UPD3

@jgh is 100% right:

This typically only happens when the table's row width is much wider than some indexes

I've made large column data and now postgres use index. Thanks everyone!

852

asked Jul 06 '17 17:07

Denis Drozdov

1 Answers

The Index-only scans wiki says

It is important to realise that the planner is concerned with minimising the total cost of the query. With databases, the cost of I/O typically dominates. For that reason, "count(*) without any predicate" queries will only use an index-only scan if the index is significantly smaller than its table. This typically only happens when the table's row width is much wider than some indexes'.

and

Index-only scans are only used when the planner surmises that that will reduce the total amount of I/O required, according to its imperfect cost-based modelling. This all heavily depends on visibility of tuples, if an index would be used anyway (i.e. how selective a predicate is, etc), and if there is actually an index available that could be used by an index-only scan in principle

Accordingly, your index is not considered "significantly smaller" and the entire dataset is to be read, which leads the planner in using a seq scan

132

answered Oct 26 '22 02:10

JGH

Related questions
                            
                                Cannot store Euro-sign into LOB String property with Hibernate/PostgreSQL
                            
                                Postgres SSL error
                            
                                Sqlalchemy complains that foreign key doesn't exist but actually it exists
                            
                                Error Relation does not exist
                            
                                Postgres hstore: GIN vs GiST index performance
                            
                                Why does add_index using 'gin' create a 'btree' index instead?
                            
                                Connection Pooling with PostgreSQL and AWS
                            
                                Django: Postgres connection not closing
                            
                                Group and count products on items from orders
                            
                                django - loaddata error when converting db from sqlite3 to postgres
                            
                                node-postgres: Setting max connection pool size
                            
                                Mysql vs Oracle XE vs Postgresql . Scalability and performance, which to chose? [closed]
                            
                                Low Postgres Cache Hit Ratio - data size or something else?
                            
                                Propel PostgreSQL Migration unable to find adapter
                            
                                Problematic nameless table in Postgresql
                            
                                PostgreSQL \dt command does not show *ALL* tables in search_path schemas
                            
                                psycopg2: Writing JSON objects using copy_from. How to format the json string?
                            
                                Slow django model instance creation with Docker
                            
                                PostgreSQL Connection Refused
                            
                                Insert JSON into PostgreSQL database with Dapper ORM (.NET Core)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is Postgres not using index on a simple GROUP BY?

Tags:

indexing

postgresql

group-by

Denis Drozdov

People also ask

1 Answers

JGH

Recent Activity

Donate For Us