Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does a missing primary key/unique key cause deadlock issues on upsert?

I came across a schema and an upsert stored procedure that was causing deadlock issues. I have a general idea about why this is causing deadlock and how to fix it. I can reproduce it but I don't have a clear understanding of the sequence of steps that is causing it. It would be great if someone can explain clearly why this is causing deadlock.

Here is the schema and the stored procedures. This code is being executed on PostgreSQL 9.2.2.

CREATE TABLE counters (                                                                                                                                                                                                                       
  count_type INTEGER NOT NULL,
  count_id   INTEGER NOT NULL,
  count      INTEGER NOT NULL
);


CREATE TABLE primary_relation (
  id INTEGER PRIMARY KEY,
  a_counter INTEGER NOT NULL DEFAULT 0
);

INSERT INTO primary_relation
SELECT i FROM generate_series(1,5) AS i;

CREATE OR REPLACE FUNCTION increment_count(ctype integer, cid integer, i integer) RETURNS VOID
AS $$
BEGIN
    LOOP
        UPDATE counters
         SET count = count + i 
         WHERE count_type = ctype AND count_id = cid;
         IF FOUND THEN
            RETURN;
          END IF; 
        BEGIN
            INSERT INTO counters (count_type, count_id, count)
             VALUES (ctype, cid, i); 
            RETURN;
        EXCEPTION WHEN OTHERS THEN
        END;
    END LOOP;
END;
$$
LANGUAGE PLPGSQL;

CREATE OR REPLACE FUNCTION update_primary_a_count(ctype integer) RETURNS VOID
AS $$
  WITH deleted_counts_cte AS (
      DELETE
      FROM counters
      WHERE count_type = ctype
      RETURNING *
  ), rollup_cte AS (
      SELECT count_id, SUM(count) AS count
      FROM deleted_counts_cte
      GROUP BY count_id
      HAVING SUM(count) <> 0
  )
  UPDATE primary_relation
  SET a_counter = a_counter + rollup_cte.count
  FROM rollup_cte
  WHERE primary_relation.id = rollup_cte.count_id
$$ LANGUAGE SQL;

And here is a python script to reproduce the deadlock.

import os                                                                                                                                                                                                                                     
import random
import time
import psycopg2

COUNTERS = 5 
THREADS = 10
ITERATIONS = 500 

def increment():
  outf = open('synctest.out.%d' % os.getpid(), 'w')
  conn = psycopg2.connect(database="test")
  cur = conn.cursor()
  for i in range(0,ITERATIONS):
    time.sleep(random.random())
    start = time.time()
    cur.execute("SELECT increment_count(0, %s, 1)", [random.randint(1,COUNTERS)])
    conn.commit()
    outf.write("%f\n" % (time.time() - start))
  conn.close()
  outf.close()

def update(n):
  outf = open('synctest.update', 'w')
  conn = psycopg2.connect(database="test")
  cur = conn.cursor()
  for i in range(0,n):
    time.sleep(random.random())
    start = time.time()
    cur.execute("SELECT update_primary_a_count(0)")
    conn.commit()
    outf.write("%f\n" % (time.time() - start))
  conn.close()

pids = []
for i in range(THREADS):
  pid = os.fork()
  if pid != 0:
    print 'Process %d spawned' % pid 
    pids.append(pid)
  else:
    print 'Starting child %d' % os.getpid()
    increment()
    print 'Exiting child %d' % os.getpid()
    os._exit(0)

update(ITERATIONS)
for pid in pids:
  print "waiting on %d" % pid 
  os.waitpid(pid, 0)

# cleanup
update(1)

I recognize that one issue with this is that the upsert will can produce duplicate rows (with multiple writers) which will likely result in some double counting. But why will this result in deadlock?

The error I get from PostgreSQL is something like the following:

process 91924 detected deadlock while waiting for ShareLock on transaction 4683083 after 100.559 ms",,,,,"SQL statement ""UPDATE counters

And the client spews something like this:

psycopg2.extensions.TransactionRollbackError: deadlock detected
DETAIL:  Process 91924 waits for ShareLock on transaction 4683083; blocked by process 91933.
Process 91933 waits for ShareLock on transaction 4683079; blocked by process 91924.
HINT:  See server log for query details.CONTEXT:  SQL statement "UPDATE counters
         SET count = count + i
         WHERE count_type = ctype AND count_id = cid"
PL/pgSQL function increment_count(integer,integer,integer) line 4 at SQL statement

To fix the issue, you need to add a primary key like so:

ALTER TABLE counters ADD PRIMARY KEY (count_type, count_id);

Any insight would be greatly appreciated. Thanks!

like image 492
Damon Snyder Avatar asked Jan 02 '14 17:01

Damon Snyder


1 Answers

because of the primary key, the number of rows in this table is always <= # threads, and the primary key ensures that no row is repeated.

When you remove the primary key, some of the threads get lagged behind and the number of rows increases, and at the same time rows get repeated. When the rows get repeated, then the update time is longer and 2 or more threads will try to update the same row(s).

Open a new terminal and type:

watch --interval 1 "psql -tc \"select count(*) from counters\" test"

Try this with and without the primary key. When you get the first deadlock, look at the results of the query above. In my case this is what I am left with in the table counters:

test=# select * from counters order by 2;
 count_type | count_id | count 
------------+----------+-------
          0 |        1 |   735
          0 |        1 |   733
          0 |        1 |   735
          0 |        1 |   735
          0 |        2 |   916
          0 |        2 |   914
          0 |        2 |   914
          0 |        3 |   882
          0 |        4 |   999
          0 |        5 |   691
          0 |        5 |   692
(11 rows)
like image 141
Bruno Avatar answered Nov 15 '22 07:11

Bruno