Postgres choosing BTREE instead of BRIN index

Tags:

I'm running Postgres 9.5 and am playing around with BRIN indexes. I have a fact table with about 150 million rows and I'm trying to get PG to use a BRIN index. My query is:

select sum(transaction_amt), 
       sum (total_amt) 
from fact_transaction 
where transaction_date_key between 20170101 and 20170201

I created both a BTREE index and a BRIN index (default pages_per_range value of 128) on column transaction_date_key (the above query is referring to January to February 2017). I would have thought that PG would choose to use the BRIN index however it goes with the BTREE index. Here is the explain plan:

https://explain.depesz.com/s/uPI

I then deleted the BTREE index, did a vacuum / analyze on the the table, and re-ran the query and it did choose the BRIN index however the run time was considerably longer:

https://explain.depesz.com/s/5VXi

In fact my tests were all faster when using the BTREE index rather than the BRIN index. I thought it was supposed to be the opposite?

I'd prefer to use the BRIN index because of its smaller size however I can't seem to get PG to use it.

Note: I loaded the data, starting from January 2017 through to June 2017 (defined via transaction_date_key) as I read that physical table ordering makes a difference when using BRIN indexes.

Does anyone know why PG is choosing to use the BTREE index and why BRIN is so much slower in my case?

962

asked Feb 07 '17 19:02

Ryan

1 Answers

It seems like the BRIN index scan is not very selective – it returns 30 million rows, all of which have to be re-checked, which is where the time is spent.

That probably means that transaction_date_key is not well correlated with the physical location of the rows in the table.

A BRIN index works by “lumping together” ranges of table blocks (how many can be configured with the storage parameter pages_per_range, whose default value is 128). The maximum and minimum of the indexed value for eatch range of blocks is stored.

So a lot of block ranges in your table contain transaction_date_key between 20170101 and 20170201, and all of these blocks have to be scanned to compute the query result.

I see two options to improve the situation:

Lower the pages_per_range storage parameter. That will make the index bigger, but it will reduce the number of “false positive” blocks.
Cluster the table on the transaction_date_key attribute. As you have found out, that requires (at least temporarily) a B-tree index on the column.

102

answered Sep 21 '22 01:09

Laurenz Albe

Related questions
                            
                                Failed to validate connection (This connection has been closed.). Possibly consider using a shorter maxLifetime value
                            
                                PostgreSQL "IF" syntax error
                            
                                How can I tell if autovacuum is running on Windows Server 2003 for my PGSQL database?
                            
                                Why is PostgreSQL saying a constraint doesn't exist when I try to drop it?
                            
                                psql: permission denied for database "dbname" ("User does not have CONNECT privilege.") / "unrecognized role option 'connect'"
                            
                                How to configure datasource in wildfly 10?
                            
                                Atomically set SERIAL value when committing transaction
                            
                                Should I default jsonb to '{}' or {} in migration
                            
                                Django Test -- Unable to drop and recreate test database
                            
                                refresh materialized view periodically postgres
                            
                                multiprocessing / psycopg2 TypeError: can't pickle _thread.RLock objects
                            
                                How to generate a virtual table to generate a sequence of dates in PostgreSQL?
                            
                                Speeding up a group by date query on a big table in postgres
                            
                                PLPGSQL array indexing start at 1?
                            
                                How to clone a test database from a production one in one single action?
                            
                                How to find the URL path to a local postgres database?
                            
                                Insert row if not exists leads to race condition?
                            
                                equivalent of PostgreSQL type "timestamp without time zone" in SQL Server
                            
                                How do I track transactions in PostgreSQL log?
                            
                                PostgreSQL support for timestamps to nanosecond resolution

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Postgres choosing BTREE instead of BRIN index

Tags:

indexing

postgresql

postgresql-9.5

Ryan

People also ask

1 Answers

Laurenz Albe

Recent Activity

Donate For Us