Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres - create table very slow - with partition and BRIN index

I have a table with over 260 million records. I have created partition for the table and also indexed it.

CREATE TABLE qt_al_90k
(
    rec_id integer,
    user_id integer,
    user_text text,
    user_number double precision,
    user_date date,
    user_seq integer,
    my_sequence integer
) PARTITION BY RANGE (rec_id);

Partition queries:

CREATE TABLE qt_al_90k_rec_id_1 PARTITION OF qt_al_90k FOR VALUES FROM (0) TO (100000);
CREATE TABLE qt_al_90k_rec_id_2 PARTITION OF qt_al_90k FOR VALUES FROM (100000) TO (200000);
CREATE TABLE qt_al_90k_rec_id_3 PARTITION OF qt_al_90k FOR VALUES FROM (200000) TO (300000);
CREATE TABLE qt_al_90k_rec_id_4 PARTITION OF qt_al_90k FOR VALUES FROM (300000) TO (400000);
CREATE TABLE qt_al_90k_rec_id_5 PARTITION OF qt_al_90k FOR VALUES FROM (400000) TO (500000);
CREATE TABLE qt_al_90k_rec_id_6 PARTITION OF qt_al_90k FOR VALUES FROM (500000) TO (600000);
CREATE TABLE qt_al_90k_rec_id_7 PARTITION OF qt_al_90k FOR VALUES FROM (600000) TO (700000);
CREATE TABLE qt_al_90k_rec_id_8 PARTITION OF qt_al_90k FOR VALUES FROM (700000) TO (800000);
CREATE TABLE qt_al_90k_rec_id_9 PARTITION OF qt_al_90k FOR VALUES FROM (800000) TO (900000);
CREATE TABLE qt_al_90k_rec_id_10 PARTITION OF qt_al_90k FOR VALUES FROM (900000) TO (1000000);
CREATE TABLE qt_al_90k_rec_id_11 PARTITION OF qt_al_90k FOR VALUES FROM (1000000) TO (1100000);
CREATE TABLE qt_al_90k_rec_id_12 PARTITION OF qt_al_90k FOR VALUES FROM (1100000) TO (1200000);
CREATE TABLE qt_al_90k_rec_id_13 PARTITION OF qt_al_90k FOR VALUES FROM (1200000) TO (1300000);
CREATE TABLE qt_al_90k_rec_id_14 PARTITION OF qt_al_90k FOR VALUES FROM (1300000) TO (1400000);
CREATE TABLE qt_al_90k_rec_id_15 PARTITION OF qt_al_90k FOR VALUES FROM (1400000) TO (1500000);
CREATE TABLE qt_al_90k_rec_id_16 PARTITION OF qt_al_90k FOR VALUES FROM (1500000) TO (1600000);
CREATE TABLE qt_al_90k_rec_id_17 PARTITION OF qt_al_90k FOR VALUES FROM (1600000) TO (1700000);
CREATE TABLE qt_al_90k_rec_id_18 PARTITION OF qt_al_90k FOR VALUES FROM (1700000) TO (1800000);
CREATE TABLE qt_al_90k_rec_id_19 PARTITION OF qt_al_90k FOR VALUES FROM (1800000) TO (1900000);
CREATE TABLE qt_al_90k_rec_id_20 PARTITION OF qt_al_90k FOR VALUES FROM (1900000) TO (2000000);
CREATE TABLE qt_al_90k_rec_id_21 PARTITION OF qt_al_90k FOR VALUES FROM (2000000) TO (2100000);
CREATE TABLE qt_al_90k_rec_id_22 PARTITION OF qt_al_90k FOR VALUES FROM (2100000) TO (2200000);
CREATE TABLE qt_al_90k_rec_id_23 PARTITION OF qt_al_90k FOR VALUES FROM (2200000) TO (2300000);
CREATE TABLE qt_al_90k_rec_id_24 PARTITION OF qt_al_90k FOR VALUES FROM (2300000) TO (2400000);
CREATE TABLE qt_al_90k_rec_id_25 PARTITION OF qt_al_90k FOR VALUES FROM (2400000) TO (2500000);
CREATE TABLE qt_al_90k_rec_id_26 PARTITION OF qt_al_90k FOR VALUES FROM (2500000) TO (2600000);
CREATE TABLE qt_al_90k_rec_id_27 PARTITION OF qt_al_90k FOR VALUES FROM (2600000) TO (3000000);

Index queries:

CREATE INDEX qt_al_90k_user_id_1 ON qt_al_90k_rec_id_1 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_1 ON qt_al_90k_rec_id_1 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_2 ON qt_al_90k_rec_id_2 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_2 ON qt_al_90k_rec_id_2 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_3 ON qt_al_90k_rec_id_3 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_3 ON qt_al_90k_rec_id_3 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_4 ON qt_al_90k_rec_id_4 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_4 ON qt_al_90k_rec_id_4 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_5 ON qt_al_90k_rec_id_5 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_5 ON qt_al_90k_rec_id_5 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_6 ON qt_al_90k_rec_id_6 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_6 ON qt_al_90k_rec_id_6 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_7 ON qt_al_90k_rec_id_7 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_7 ON qt_al_90k_rec_id_7 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_8 ON qt_al_90k_rec_id_8 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_8 ON qt_al_90k_rec_id_8 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_9 ON qt_al_90k_rec_id_9 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_9 ON qt_al_90k_rec_id_9 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_10 ON qt_al_90k_rec_id_10 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_10 ON qt_al_90k_rec_id_10 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_11 ON qt_al_90k_rec_id_11 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_11 ON qt_al_90k_rec_id_11 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_12 ON qt_al_90k_rec_id_12 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_12 ON qt_al_90k_rec_id_12 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_13 ON qt_al_90k_rec_id_13 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_13 ON qt_al_90k_rec_id_13 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_14 ON qt_al_90k_rec_id_14 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_14 ON qt_al_90k_rec_id_14 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_15 ON qt_al_90k_rec_id_15 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_15 ON qt_al_90k_rec_id_15 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_16 ON qt_al_90k_rec_id_16 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_16 ON qt_al_90k_rec_id_16 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_17 ON qt_al_90k_rec_id_17 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_17 ON qt_al_90k_rec_id_17 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_18 ON qt_al_90k_rec_id_18 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_18 ON qt_al_90k_rec_id_18 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_19 ON qt_al_90k_rec_id_19 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_19 ON qt_al_90k_rec_id_19 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_20 ON qt_al_90k_rec_id_20 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_20 ON qt_al_90k_rec_id_20 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_21 ON qt_al_90k_rec_id_21 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_21 ON qt_al_90k_rec_id_21 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_22 ON qt_al_90k_rec_id_22 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_22 ON qt_al_90k_rec_id_22 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_23 ON qt_al_90k_rec_id_23 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_23 ON qt_al_90k_rec_id_23 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_24 ON qt_al_90k_rec_id_24 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_24 ON qt_al_90k_rec_id_24 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_25 ON qt_al_90k_rec_id_25 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_25 ON qt_al_90k_rec_id_25 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_26 ON qt_al_90k_rec_id_26 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_26 ON qt_al_90k_rec_id_26 USING brin(rec_id);
CREATE INDEX qt_al_90k_user_id_27 ON qt_al_90k_rec_id_27 USING brin(user_id);
CREATE INDEX qt_al_90k_rec_27 ON qt_al_90k_rec_id_27 USING brin(rec_id);

This is my retreival query:

create table test123 as
select user_id,user_text, rec_id, user_seq 
 from qt_al_90k 
 where rec_id in (
  1492,1493,1494,1495,1496,1497,1498,1499,1500,1501) and 
 user_id in (
  37
 );

This query should return about 0.5 million records. However, this is taking more than 11 minutes to create. When I pass over 100 rec_id, the query never returns.

I have more than 1000 rec_id's to process through this, however, I am not able to do it.

How can I improve the performance of the create table statement? Please help.

EDIT

Here is the EXPLAIN (ANALYZE, BUFFERS)

"Gather  (cost=1607.98..2817973.44 rows=257182 width=29) (actual time=119.210..344297.379 rows=500000 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  Buffers: shared hit=49963 read=43384"
"  I/O Timings: read=509470.717"
"  ->  Parallel Append  (cost=607.98..2791255.24 rows=107159 width=29) (actual time=116.662..343151.672 rows=166667 loops=3)"
"        Buffers: shared hit=49963 read=43384"
"        I/O Timings: read=509470.717"
"        ->  Parallel Bitmap Heap Scan on qt_al_90k_rec_id_1  (cost=607.98..2790719.45 rows=107159 width=29) (actual time=116.661..343137.730 rows=166667 loops=3)"
"              Recheck Cond: ((rec_id = ANY ('{1492,1493,1494,1495,1496,1497,1498,1499,1500,1501}'::integer[])) AND (va_id = 37))"
"              Rows Removed by Index Recheck: 3862086"
"              Heap Blocks: lossy=30808"
"              Buffers: shared hit=49963 read=43384"
"              I/O Timings: read=509470.717"
"              ->  BitmapAnd  (cost=607.98..607.98 rows=4480328 width=0) (actual time=92.004..92.004 rows=0 loops=1)"
"                    Buffers: shared hit=395 read=24"
"                    I/O Timings: read=0.082"
"                    ->  Bitmap Index Scan on qt_al_90k_rec_1  (cost=0.00..116.02 rows=5232168 width=0) (actual time=47.230..47.230 rows=8016640 loops=1)"
"                          Index Cond: (rec_id = ANY ('{1492,1493,1494,1495,1496,1497,1498,1499,1500,1501}'::integer[]))"
"                          Buffers: shared hit=381 read=1"
"                          I/O Timings: read=0.011"
"                    ->  Bitmap Index Scan on qt_al_90k_user_id_1  (cost=0.00..363.11 rows=134190580 width=0) (actual time=44.130..44.130 rows=11761920 loops=1)"
"                          Index Cond: (va_id = 37)"
"                          Buffers: shared hit=14 read=23"
"                          I/O Timings: read=0.071"
"Planning Time: 1.814 ms"
"Execution Time: 344320.891 ms"
like image 852
dang Avatar asked Jan 24 '20 15:01

dang


2 Answers

Is the physical ordering of the rows well correlated with the values of the user_id (or va_id, whatever the true name of the column is) and rec_id columns? If not, then the BRIN indexes you have are pretty much useless. You could retry with BTREE indexes instead. Or maybe better, a multi-column index on (va_id, rec_id). For test purposes, you could build it only on the partition qt_al_90k_rec_id_1.

Your data is poorly cached. But, is that because it can't be well cached (you don't have enough memory) or because it just happens not to be at the moment you ran the query? Since most of the time is reading the table data, making more efficient use of indexes might not help much, as it still needs to read the table data (unless you can get index-only scans by adding the other columns being select as well)

What is your IO system like? If you have RAID or JBOD, then increasing effective_io_concurrency could be helpful.

I have more than 1000 rec_id's to process through this

How many different partitions do those fall in?

It would be nice to see the EXPLAIN for this as well. Since it never finishes, you can't do EXPLAIN (ANALYZE).

Also, how often do you need to do this? This doesn't look like the kind of query you run very often. Building in index just for it might not be warranted, but a better index might be useful for other queries as well.

like image 137
jjanes Avatar answered Sep 30 '22 05:09

jjanes


I guess your partitioning is itself becoming an overhead. What if the records ids are the last ids of each partition.

Better would be to have an index on the rec_id in the same table and use exists not IN clause for fast retrieval in this case.

like image 34
Himanshu Avatar answered Sep 30 '22 07:09

Himanshu