Adding support for additional data types to Bloom Indexes

Question

Postgres 13.4.

I've got a setup where a Bloom index on a mixture of field types, including uuid could prove very helpful. Out of the box, the Bloom extension supports only int4 and text. How do you add support for additional types?

Morris de Oryx · Accepted Answer

Per Laurenz Albe's comments, I've reworked the post into a Q & A format.

Every couple of years, I work through that fantastic ten-part series on indexing from the folks at Postgres Professional. I was looking at the final piece on Bloom filters (https://habr.com/en/company/postgrespro/blog/452968/), and spotted an example of how to extended support to additional data types. It caught my eye as I've not figured out CREATE OPERATOR CLASS before. I also haven't found these details mentioned in other discussion of Postgres' Bloom indexes. I figured I'd write up some notes for the sake of the archives and, hopefully, to attract some additional information and/or corrections.

Indexes in Postgres are....flexible

The indexing options in Postgres are definitely one of its ten top five features. The variety of basic index types/frameworks and variations are incredible. Unfortunately, the index-related parts of the pg_catalog are pretty dense. If you've got something you want to index, and the data type isn't supported, it seems that you can add support for additional types with CREATE OPERATOR CLASS. I'd never figured out how to do this, until seeing the examples at https://habr.com/en/company/postgrespro/blog/452968/. While this example, and the code below, are specifically for Bloom filter indexes, I'm hoping that most of the mechanics transfer over to other index types.

Bloom Filters

I don't need a Bloom filter today, and may never want one. But, let's be honest, they are deeply cool*. And a great feature to know about in Postgres because they solve a problem that you would not normally expect an RDBMS to have a solution for. (Wide columns, random search combinations across many, many, fields.)

I've not seen Bloom filters in RDMBS systems before. I have seen them in in distributed log aggregation systems, where they're helpful at reducing unnecessary and expensive loads of data from remote locations, or from cold storage. Postgres Bloom filters are different than that.

I might have a good use for a Bloom index as we've got a wide table with 13 UUID columns that are searched on in any number of combinations, along with a variety of int4, and other data. I like to experiment with this stuff in advance so that, if a problem comes up, I already have possible tools in mind. And, a Bloom filter is not what a DBA would normally think of immediately. However, a Bloom index is a great example case for CREATE OPERATOR CLASS. To get it up and running requires only three steps:

Install the bloom extension, if it isn't already installed.
Define a new operator class for the data type of interest and the bloom access method. This requires finding a suitable hashing function.
Create your new index using the new operator class.

Assuming you've installed the bloom extension (https://www.postgresql.org/docs/current/bloom.html), here's a little search to show the operator classes associated with bloom indexes:

select pg_am.amname,
       pg_opclass.opcintype::regtype,
       pg_opclass.opcname,
       pg_am.amhandler::regproc

from pg_opclass
left join pg_am on pg_am.oid = pg_opclass.opcmethod
where amname = 'bloom'
order by amname,opcname;

+--------+-----------+---------------+-----------+
| amname | opcintype | opcname       | amhandler |
+--------+-----------+---------------+-----------+
| bloom  | integer   | int4_ops      | blhandler |
| bloom  | text      | text_ops      | blhandler |
+--------+-----------+---------------+-----------+

So, no help for uuid data.

Adding an Operator Class

Here's the code needed for this simple case:

CREATE OPERATOR CLASS uuid_ops
DEFAULT FOR TYPE uuid USING bloom AS
  OPERATOR  1  =(uuid,uuid),
  FUNCTION  1  uuid_hash;

I think that the code above means "OPERATOR strategy 1 EQUALITY hash", more-or-less. I'm guessing that adding an operator class to an index type with a wider range of operators is more involved. Bloom filters only support "might exist/does not exist" logic, so there's just the one declaration needed, and the hash function to match. If you run the previous again, it now returns an entry for the new operator class:

+--------+-----------+---------------+-----------+
| amname | opcintype | opcname       | amhandler |
+--------+-----------+---------------+-----------+
| bloom  | integer   | int4_ops      | blhandler |
| bloom  | text      | text_ops      | blhandler |
| bloom  | uuid      | uuid_ops      | blhandler |
+--------+-----------+---------------+-----------+

Finding a Suitable Hash Functions

I was told some time back that while not listed in the docs, the Postgres source code includes a wide range of type-specific hash functions. There's a whole stack of type-specific function implemented for hash indexes:

https://doxygen.postgresql.org/hashfunc_8c_source.html

I don't see uuid in there, but a hash is implemented here:

https://github.com/postgres/postgres/blob/REL_13_STABLE/src/backend/utils/adt/uuid.c#L403

All you need to know is the name of the function to supply CREATE OPERATOR CLASS, not where it's implemented in the source.

Using the new Operator Class

With this in place, a Bloom index can be created on uuid data. Is this a good idea? No idea. Do I need a Bloom index today? No. But I like to work through Postgres' features to know what's available when I do have a suitable problem. Indexes in Postgres are so much more extensive than in anything else I know, or have seen...they take some study. Below is an example that uses the new operator class on a Bloom index that includes ten uuid fields followed by three int4 fields.

create index analytic_work_bloom_idx on data.analytic_work
using bloom(
             id,
             hsys_id,
             facility_id,
             inv_id,
             user_id,
             activity_id,
             assembly_id,
             q_event_id,
             scan_id,
             scase_id,

             expected_count,
             actual_count,
             missing_count)

with (length=96,
           col1  = 7,
           col2  = 7,
           col3  = 7,
           col4  = 7,
           col5  = 7,
           col6  = 7,
           col7  = 7,
           col8  = 7,
           col9  = 7,
           col10 = 7,
           col11 = 7,
           col12 = 7,
           col13 = 7);

select pg_size_pretty(pg_total_relation_size('analytic_work_bloom_idx'));  --- Curious.

Don't take the example above as a great example of how to figure the right signature length and bytes-per-column for a Bloom index. There are lots of articles out there explaining the mechanics, the theory, and proposing methods for tuning the signature length and bytes-per-column. Past my pay grade.

Crude Results

I had about 600K+ rows in a sample database, and added the Bloom index. Results should be pretty variable but, in my case, I found that:

Index build speed was fine.
The bloom index on 13 fields was about 4x larger than a B-tree on one uuid field.
The Bloom index cut a random multi-field query time in half. Your Millage Will Vary.

Additions and Corrections Please

I'd be glad to see any sort of extra information or corrections that people have to offer.

Adding support for additional data types to Bloom Indexes

Tags:

operators

indexing

postgresql

bloom-filter

Morris de Oryx

1 Answers

Indexes in Postgres are....flexible

Bloom Filters

Adding an Operator Class

Finding a Suitable Hash Functions

Using the new Operator Class

Crude Results

Additions and Corrections Please

Morris de Oryx

Recent Activity

Donate For Us

Adding support for additional data types to Bloom Indexes

Tags:

operators

indexing

postgresql

bloom-filter

Morris de Oryx

1 Answers

Indexes in Postgres are....flexible

Bloom Filters

Adding an Operator Class

Finding a Suitable Hash Functions

Using the new Operator Class

Crude Results

Additions and Corrections Please

Morris de Oryx

Related questions

Recent Activity

Donate For Us