Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql parallel bulk INSERT with worker don't parallelize

My scenario:

  • 10 worker
  • Database has set 100 max connections
  • Every worker has its own DB connection (max. 10 connections)
  • Every worker starts a transaction (BEGIN; COMMIT;)
  • Every worker inserts data in the same table with bulk insert inside the transaction
  • Data to insert e.g. 1 million rows
  • Every worker handles 1000 rows (batches of size 1000)

The query of every worker:

BEGIN;
  INSERT INTO "test_tbl" ("id",...) VALUES 
    (...),(...),...[1000 entries]... RETURNING id;
COMMIT;

Table test_tbl has only constraint PRIMARY KEY (id) with index CREATE UNIQUE INDEX formulas_pkey ON formulas USING btree (id)

Problem

After many hours of analyzing, it seams that the worker wait that another worker has finished the insert. Why the workers cannot insert new data into same table at the same time?

UPDATE

I have removed all constraints and all indices (primary keys, foreign keys, etc.) but still the same problem. No parallelization.

Added note:

  • Data to insert e.g. 1 million rows
  • Every worker handles 1000 rows (batches of size 1000)
like image 852
phlegx Avatar asked Feb 17 '17 16:02

phlegx


1 Answers

The fact that there is a primary key means that the database has to check for the values of the corresponding column(s) to be UNIQUE and NOT NULL. The second transaction beginning to insert data cannot do it until the first one hasn't finished inserting (otherwise, there could be non-unique values).

If you just don't do the bulk insert in 1 transaction per worker (but, let's say, batches of 100 inserts), it will work much faster. You will need more calls between client and database (you will have n calls with 100 rows of data, instead of 1 very big call with n*100 rows); but the database will be able to commit much earlier.

In PostgreSQL:

reading never blocks writing and writing never blocks reading

... but transaction 1 writing can (and often will) block transaction 2 also writing.

In case you cannot do batch inserts, you can try deferring the PRIMARY KEY constraint at the end of the transaction.This is done by defining your PRIMARY KEY constraint DEFERRABLE INITIALLY DEFERRED (which is not the default for PostgreSQL, although it is the SQL standard). See the documentation for "create table":

DEFERRABLE
NOT DEFERRABLE

This controls whether the constraint can be deferred. A constraint that is not deferrable will be checked immediately after every command. Checking of constraints that are deferrable can be postponed until the end of the transaction (using the SET CONSTRAINTS command). NOT DEFERRABLE is the default. Currently, only UNIQUE, PRIMARY KEY, EXCLUDE, and REFERENCES (foreign key) constraints accept this clause.

like image 178
joanolo Avatar answered Oct 10 '22 09:10

joanolo