I'm running Postgres 9.5 and am playing around with BRIN indexes. I have a fact table with about 150 million rows and I'm trying to get PG to use a BRIN index. My query is:
select sum(transaction_amt),
sum (total_amt)
from fact_transaction
where transaction_date_key between 20170101 and 20170201
I created both a BTREE index and a BRIN index (default pages_per_range value of 128) on column transaction_date_key (the above query is referring to January to February 2017). I would have thought that PG would choose to use the BRIN index however it goes with the BTREE index. Here is the explain plan:
https://explain.depesz.com/s/uPI
I then deleted the BTREE index, did a vacuum / analyze on the the table, and re-ran the query and it did choose the BRIN index however the run time was considerably longer:
https://explain.depesz.com/s/5VXi
In fact my tests were all faster when using the BTREE index rather than the BRIN index. I thought it was supposed to be the opposite?
I'd prefer to use the BRIN index because of its smaller size however I can't seem to get PG to use it.
Note: I loaded the data, starting from January 2017 through to June 2017 (defined via transaction_date_key) as I read that physical table ordering makes a difference when using BRIN indexes.
Does anyone know why PG is choosing to use the BTREE index and why BRIN is so much slower in my case?
PostgreSQL includes an implementation of the standard btree (multi-way balanced tree) index data structure. Any data type that can be sorted into a well-defined linear order can be indexed by a btree index.
PostgreSQL B-Tree indexes are multi-level tree structures, where each level of the tree can be used as a doubly-linked list of pages. A single metapage is stored in a fixed position at the start of the first segment file of the index. All other pages are either leaf pages or internal pages.
Hash Index Advantages Over B-Tree Indexes The first advantage is the hash index can locate the key values being searched for without the need to traverse any type of tree data structure. This can be advantageous for large tables because of lowered I/O to find the necessary records.
A B-tree index creates a multi-level tree structure that breaks a database down into fixed-size blocks or pages. Each level of this tree can be used to link those pages via an address location, allowing one page (known as a node, or internal page) to refer to another with leaf pages at the lowest level.
It seems like the BRIN index scan is not very selective – it returns 30 million rows, all of which have to be re-checked, which is where the time is spent.
That probably means that transaction_date_key
is not well correlated with the physical location of the rows in the table.
A BRIN index works by “lumping together” ranges of table blocks (how many can be configured with the storage parameter pages_per_range
, whose default value is 128). The maximum and minimum of the indexed value for eatch range of blocks is stored.
So a lot of block ranges in your table contain transaction_date_key
between 20170101
and 20170201
, and all of these blocks have to be scanned to compute the query result.
I see two options to improve the situation:
Lower the pages_per_range
storage parameter. That will make the index bigger, but it will reduce the number of “false positive” blocks.
Cluster the table on the transaction_date_key
attribute. As you have found out, that requires (at least temporarily) a B-tree index on the column.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With