Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres choosing BTREE instead of BRIN index

I'm running Postgres 9.5 and am playing around with BRIN indexes. I have a fact table with about 150 million rows and I'm trying to get PG to use a BRIN index. My query is:

select sum(transaction_amt), 
       sum (total_amt) 
from fact_transaction 
where transaction_date_key between 20170101 and 20170201 

I created both a BTREE index and a BRIN index (default pages_per_range value of 128) on column transaction_date_key (the above query is referring to January to February 2017). I would have thought that PG would choose to use the BRIN index however it goes with the BTREE index. Here is the explain plan:

https://explain.depesz.com/s/uPI

I then deleted the BTREE index, did a vacuum / analyze on the the table, and re-ran the query and it did choose the BRIN index however the run time was considerably longer:

https://explain.depesz.com/s/5VXi

In fact my tests were all faster when using the BTREE index rather than the BRIN index. I thought it was supposed to be the opposite?

I'd prefer to use the BRIN index because of its smaller size however I can't seem to get PG to use it.

Note: I loaded the data, starting from January 2017 through to June 2017 (defined via transaction_date_key) as I read that physical table ordering makes a difference when using BRIN indexes.

Does anyone know why PG is choosing to use the BTREE index and why BRIN is so much slower in my case?

like image 962
Ryan Avatar asked Feb 07 '17 19:02

Ryan


People also ask

Does Postgres use B-tree or B+ tree?

PostgreSQL includes an implementation of the standard btree (multi-way balanced tree) index data structure. Any data type that can be sorted into a well-defined linear order can be indexed by a btree index.

Does Postgres use B trees?

PostgreSQL B-Tree indexes are multi-level tree structures, where each level of the tree can be used as a doubly-linked list of pages. A single metapage is stored in a fixed position at the start of the first segment file of the index. All other pages are either leaf pages or internal pages.

What is the advantage of a hash index over a B-tree index in PostgreSQL?

Hash Index Advantages Over B-Tree Indexes The first advantage is the hash index can locate the key values being searched for without the need to traverse any type of tree data structure. This can be advantageous for large tables because of lowered I/O to find the necessary records.

What is index using btree?

A B-tree index creates a multi-level tree structure that breaks a database down into fixed-size blocks or pages. Each level of this tree can be used to link those pages via an address location, allowing one page (known as a node, or internal page) to refer to another with leaf pages at the lowest level.


1 Answers

It seems like the BRIN index scan is not very selective – it returns 30 million rows, all of which have to be re-checked, which is where the time is spent.

That probably means that transaction_date_key is not well correlated with the physical location of the rows in the table.

A BRIN index works by “lumping together” ranges of table blocks (how many can be configured with the storage parameter pages_per_range, whose default value is 128). The maximum and minimum of the indexed value for eatch range of blocks is stored.

So a lot of block ranges in your table contain transaction_date_key between 20170101 and 20170201, and all of these blocks have to be scanned to compute the query result.

I see two options to improve the situation:

  • Lower the pages_per_range storage parameter. That will make the index bigger, but it will reduce the number of “false positive” blocks.

  • Cluster the table on the transaction_date_key attribute. As you have found out, that requires (at least temporarily) a B-tree index on the column.

like image 102
Laurenz Albe Avatar answered Sep 21 '22 01:09

Laurenz Albe