Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres deadlocks on concurrent upserts

We have an application which reads from a data stream and upserts that information into a database. The data is changes which occur on Google Drive which means that many events which impact the same objects can occur very close to each other.

We're running into deadlocks when upserting this information into the database, here is what comes out in the log. I have reconstructed and sanitised the query for readability:

ERROR:  deadlock detected
DETAIL:  Process 10586 waits for ShareLock on transaction 166892743; blocked by process 10597.
  Process 10597 waits for ShareLock on transaction 166892741; blocked by process 10586.
  Process 10586: 
          INSERT INTO documents
              (version, source, source_id, ingestion_date)
          VALUES
              (0, 'googledrive', 'alpha', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'beta', '2017-09-21T07:03:51.074Z')
              (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'delta', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'epsilon', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'zeta', '2017-09-21T07:03:51.074Z')

          ON CONFLICT (source, source_id)
          DO UPDATE
          SET
              ingestion_date = EXCLUDED.ingestion_date,
              version = documents.version + 1

          RETURNING source_id, source, uid

  Process 10597: 
          INSERT INTO documents
              (version, source, source_id, ingestion_date)
          VALUES
              (0, 'googledrive', 'delta', '2017-09-21T07:03:51.167Z'),
              (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.167Z')

          ON CONFLICT (source, source_id)
          DO UPDATE
          SET
              ingestion_date = EXCLUDED.ingestion_date,
              version = documents.version + 1

          RETURNING source_id, source, uid

HINT:  See server log for query details.
CONTEXT:  while locking tuple (3908269,11) in relation "documents"
STATEMENT:  
          INSERT INTO documents
              (version, source, source_id, ingestion_date)
          VALUES
              (0, 'googledrive', 'alpha', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'beta', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'delta', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'epsilon', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'zeta', '2017-09-21T07:03:51.074Z')

          ON CONFLICT (source, source_id)
          DO UPDATE
          SET
              ingestion_date = EXCLUDED.ingestion_date,
              version = documents.version + 1

          RETURNING source_id, source, uid

The schema:

    Column      |            Type             |                             Modifiers
----------------+-----------------------------+-------------------------------------------------------------------
uid             | uuid                        | not null default gen_random_uuid()
date_created    | timestamp without time zone | not null default now()
sequence_id     | bigint                      | not null default nextval('documents__sequence_id__seq'::regclass)
version         | integer                     | not null default 0
source          | text                        | not null
source_id       | text                        | not null
ingestion_date  | timestamp without time zone | not null

Indexes:
    "documents__pkey" PRIMARY KEY, btree (uid)
    "documents__sequence_id__unique" UNIQUE CONSTRAINT, btree (sequence_id)
    "documents__source__source_id__deleted" UNIQUE, btree (source, source_id)
    "documents__ingestion_date__idx" btree (ingestion_date)
    "documents__source_id__source__idx" btree (source_id, source)

I suspect the problem is something like "the first transaction was locking rows with source_id alpha, beta, gamma in sequence, meanwhile the second transaction was locking rows with source_id delta, gamma in the reverse order, and the deadlock happened at the point they both locked gamma and delta", however the timing here is very tight!

What would the solution for this be? Sorting our values list by source_id?

like image 625
Rob Haswell Avatar asked Sep 22 '17 13:09

Rob Haswell


2 Answers

I can think of three solutions:

  1. You insert only one row per statement, but that's inefficient.

  2. You sort the rows before inserting them.

  3. You retry a transaction if it gets a deadlock or serialization error.

I'd prefer the third solution unless the errors happen very often.

like image 110
Laurenz Albe Avatar answered Sep 23 '22 06:09

Laurenz Albe


Your query's syntax allows ordering the values easily:

INSERT INTO documents
          (version, source, source_id, ingestion_date)
   SELECT * FROM (
      VALUES
          (0, 'googledrive', 'alpha', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'beta', '2017-09-21T07:03:51.074Z')
          (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'delta', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'epsilon', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'zeta', '2017-09-21T07:03:51.074Z')
      ) AS v ORDER BY source, source_id

      ON CONFLICT (source, source_id)

This should solve your problem. Performance should be nice, as the sort will be tiny.

like image 22
bobflux Avatar answered Sep 20 '22 06:09

bobflux