I came across a schema and an upsert stored procedure that was causing deadlock issues. I have a general idea about why this is causing deadlock and how to fix it. I can reproduce it but I don't have a clear understanding of the sequence of steps that is causing it. It would be great if someone can explain clearly why this is causing deadlock.
Here is the schema and the stored procedures. This code is being executed on PostgreSQL 9.2.2.
CREATE TABLE counters (
count_type INTEGER NOT NULL,
count_id INTEGER NOT NULL,
count INTEGER NOT NULL
);
CREATE TABLE primary_relation (
id INTEGER PRIMARY KEY,
a_counter INTEGER NOT NULL DEFAULT 0
);
INSERT INTO primary_relation
SELECT i FROM generate_series(1,5) AS i;
CREATE OR REPLACE FUNCTION increment_count(ctype integer, cid integer, i integer) RETURNS VOID
AS $$
BEGIN
LOOP
UPDATE counters
SET count = count + i
WHERE count_type = ctype AND count_id = cid;
IF FOUND THEN
RETURN;
END IF;
BEGIN
INSERT INTO counters (count_type, count_id, count)
VALUES (ctype, cid, i);
RETURN;
EXCEPTION WHEN OTHERS THEN
END;
END LOOP;
END;
$$
LANGUAGE PLPGSQL;
CREATE OR REPLACE FUNCTION update_primary_a_count(ctype integer) RETURNS VOID
AS $$
WITH deleted_counts_cte AS (
DELETE
FROM counters
WHERE count_type = ctype
RETURNING *
), rollup_cte AS (
SELECT count_id, SUM(count) AS count
FROM deleted_counts_cte
GROUP BY count_id
HAVING SUM(count) <> 0
)
UPDATE primary_relation
SET a_counter = a_counter + rollup_cte.count
FROM rollup_cte
WHERE primary_relation.id = rollup_cte.count_id
$$ LANGUAGE SQL;
And here is a python script to reproduce the deadlock.
import os
import random
import time
import psycopg2
COUNTERS = 5
THREADS = 10
ITERATIONS = 500
def increment():
outf = open('synctest.out.%d' % os.getpid(), 'w')
conn = psycopg2.connect(database="test")
cur = conn.cursor()
for i in range(0,ITERATIONS):
time.sleep(random.random())
start = time.time()
cur.execute("SELECT increment_count(0, %s, 1)", [random.randint(1,COUNTERS)])
conn.commit()
outf.write("%f\n" % (time.time() - start))
conn.close()
outf.close()
def update(n):
outf = open('synctest.update', 'w')
conn = psycopg2.connect(database="test")
cur = conn.cursor()
for i in range(0,n):
time.sleep(random.random())
start = time.time()
cur.execute("SELECT update_primary_a_count(0)")
conn.commit()
outf.write("%f\n" % (time.time() - start))
conn.close()
pids = []
for i in range(THREADS):
pid = os.fork()
if pid != 0:
print 'Process %d spawned' % pid
pids.append(pid)
else:
print 'Starting child %d' % os.getpid()
increment()
print 'Exiting child %d' % os.getpid()
os._exit(0)
update(ITERATIONS)
for pid in pids:
print "waiting on %d" % pid
os.waitpid(pid, 0)
# cleanup
update(1)
I recognize that one issue with this is that the upsert will can produce duplicate rows (with multiple writers) which will likely result in some double counting. But why will this result in deadlock?
The error I get from PostgreSQL is something like the following:
process 91924 detected deadlock while waiting for ShareLock on transaction 4683083 after 100.559 ms",,,,,"SQL statement ""UPDATE counters
And the client spews something like this:
psycopg2.extensions.TransactionRollbackError: deadlock detected
DETAIL: Process 91924 waits for ShareLock on transaction 4683083; blocked by process 91933.
Process 91933 waits for ShareLock on transaction 4683079; blocked by process 91924.
HINT: See server log for query details.CONTEXT: SQL statement "UPDATE counters
SET count = count + i
WHERE count_type = ctype AND count_id = cid"
PL/pgSQL function increment_count(integer,integer,integer) line 4 at SQL statement
To fix the issue, you need to add a primary key like so:
ALTER TABLE counters ADD PRIMARY KEY (count_type, count_id);
Any insight would be greatly appreciated. Thanks!
because of the primary key, the number of rows in this table is always <= # threads, and the primary key ensures that no row is repeated.
When you remove the primary key, some of the threads get lagged behind and the number of rows increases, and at the same time rows get repeated. When the rows get repeated, then the update time is longer and 2 or more threads will try to update the same row(s).
Open a new terminal and type:
watch --interval 1 "psql -tc \"select count(*) from counters\" test"
Try this with and without the primary key. When you get the first deadlock, look at the results of the query above. In my case this is what I am left with in the table counters:
test=# select * from counters order by 2;
count_type | count_id | count
------------+----------+-------
0 | 1 | 735
0 | 1 | 733
0 | 1 | 735
0 | 1 | 735
0 | 2 | 916
0 | 2 | 914
0 | 2 | 914
0 | 3 | 882
0 | 4 | 999
0 | 5 | 691
0 | 5 | 692
(11 rows)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With