Query Postgres table by Block Range Index (BRIN) identifier directly

Question

I have N client machines. I want to load each of machine with distinct partition of BRIN index.

That requires to:

create BRIN with predefined number of partitions - equal to number of client machines
send queries from clients which uses WHERE on BRIN partitions identifier instead of filter on indexed column

The main goal is performance improvement when loading single table from postgres into distributed client machines, keeping equal number of rows between the clients - or close to equal if rows count not divides by machines count.

I can achieve it currently by maintaining new column which chunks my table into number of buckets equal to number of client machines (or use row_number() over (order by datetime) % N on the fly). This way it would not be efficient in timing and memory, and the BRIN index looks like a nice feature which could speed up such use cases.

Minimal reproducible example for 3 client machines:

CREATE TABLE bigtable (datetime TIMESTAMPTZ, value TEXT);
INSERT INTO bigtable VALUES ('2015-12-01 00:00:00+00'::TIMESTAMPTZ, 'txt1');
INSERT INTO bigtable VALUES ('2015-12-01 05:00:00+00'::TIMESTAMPTZ, 'txt2');
INSERT INTO bigtable VALUES ('2015-12-02 02:00:00+00'::TIMESTAMPTZ, 'txt3');
INSERT INTO bigtable VALUES ('2015-12-02 03:00:00+00'::TIMESTAMPTZ, 'txt4');
INSERT INTO bigtable VALUES ('2015-12-02 05:00:00+00'::TIMESTAMPTZ, 'txt5');
INSERT INTO bigtable VALUES ('2015-12-02 16:00:00+00'::TIMESTAMPTZ, 'txt6');
INSERT INTO bigtable VALUES ('2015-12-02 23:00:00+00'::TIMESTAMPTZ, 'txt7');

Expected output:

client 1

2015-12-01 00:00:00+00, 'txt1'
2015-12-01 05:00:00+00, 'txt2'
2015-12-02 02:00:00+00, 'txt3'

client 2

2015-12-02 03:00:00+00, 'txt4'
2015-12-02 05:00:00+00, 'txt5'

client 3

2015-12-02 16:00:00+00, 'txt6'
2015-12-02 23:00:00+00, 'txt7'

The question:
How can I create BRIN with predefined number of partitions and run queries which filters on partition identifiers instead of filtering on index column?
Optionally any other way that BRIN (or other pg goodies) can speed up the task of parallel loading multiple clients from single table?

jjanes · Accepted Answer

It sounds like you want to shard a table over many machines, and have each local table (one shard of the global table) have a BRIN index with exactly one bucket. But that does not make any sense. If the single BRIN index range covers the entire (local) table, then it can never be very helpful.

It sounds like what you are looking for is partitioning with CHECK constraints that can be used for partition-exclusion. PostgreSQL has supported that for a long time with table inheritance (although not for each partition being on a separate machine). Using this method, the range covered in the CHECK constraint has to be set explicitly for each partition. This ability to explicitly specify the bounds sounds like it exactly what you are looking for, just using a different technology.

But, the partition exclusion constraint code doesn't work well with modulus. The code is smart enough to know that WHERE id=5 only needs to check the CHECK (id BETWEEN 1 and 10) partition, because it knows that id=5 implies that id is between 1 and 10. More accurately, it know that contrapositive of that.

But the code was never written to know that WHERE id=5 implies that id%10 = 5%10, even though humans know that. So if you build your partitions on modulus operators, like CHECK (id%10=5) rather than on ranges, you would have to sprinkle all your queries with WHERE id = $1 and id % 10= $1 %10 if you wanted it to take advantage of the constraints.

Query Postgres table by Block Range Index (BRIN) identifier directly

Tags:

sql

postgresql

postgresql-9.5

jangorecki

1 Answers

jjanes

Recent Activity

Donate For Us

Query Postgres table by Block Range Index (BRIN) identifier directly

Tags:

sql

postgresql

postgresql-9.5

jangorecki

1 Answers

jjanes

Related questions

Recent Activity

Donate For Us