Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I avoid this three-way deadlock in Postgres?

I am hitting a three-way deadlock in Postgres, and I don't really understand what is happening that is causing it. The log message is,

    Process 5671 waits for ExclusiveLock on tuple (33,12) of relation 31709 of database 16393; blocked by process 5652.
    Process 5652 waits for ShareLock on transaction 3382643; blocked by process 5670.
    Process 5670 waits for ExclusiveLock on tuple (33,12) of relation 31709 of database 16393; blocked by process 5671.
    Process 5671: UPDATE "user" SET
                    seqno = seqno + 1,
                    push_pending = true
                  WHERE user_id = $1
                  RETURNING seqno
    Process 5652: UPDATE "user" SET
                    seqno = seqno + 1,
                    push_pending = true
                  WHERE user_id = $1
                  RETURNING seqno
    Process 5670: UPDATE "user" SET
                    seqno = seqno + 1,
                    push_pending = true
                  WHERE user_id = $1
                  RETURNING seqno

(Relation 31709 is the "user" table. The user_id is the same in all three transactions.)

This doesn't seem like your run-of-the-mill deadlock you see demonstrated in the documentation. I am not updating multiple rows of this table out of order. I suspect the RETURNING clause has something to do with it, but I can't see exactly why. Any ideas on how I can solve this, or debug it further?

Update for Erwin's questions in the comments: This is Postgres 9.3. There are other commands in this transaction, although I don't believe they are touching the "user" table. There is one trigger on the table for updating an updated_at column with current_timestamp():

CREATE OR REPLACE FUNCTION update_timestamp() RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = now();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

I'll look into getting details for the transaction when I can reproduce this; I am analyzing the logs after the fact.

Update #2: I rebuilt Postgres with LOCK_DEBUG and ran with trace_locks=on in an attempt to wrap my head around the order in which locks are taken.

The updated deadlock message is:

Process 54131 waits for ShareLock on transaction 4774; blocked by process 54157.
Process 54157 waits for ExclusiveLock on tuple (1,16) of relation 18150 of database 18136; blocked by process 54131.

I can see the blocking on the ExclusiveLock pretty clearly:

2014-08-05 10:35:15 EDT apiary [54131] 2/316 4773 LOG:  00000: LockAcquire: new: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0) type(ExclusiveLock)
2014-08-05 10:35:15 EDT apiary [54131] 2/316 4773 LOG:  00000: GrantLock: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock)
2014-08-05 10:35:15 EDT apiary [54157] 3/188 4774 LOG:  00000: LockAcquire: found: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock)
2014-08-05 10:35:15 EDT apiary [54157] 3/188 4774 LOG:  00000: WaitOnLock: sleeping on lock: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(80) req(0,0,0,0,0,0,2)=2 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock)

(Format is date program [pid] virtualtxid txid msg)

But I don't see where the ShareLock is created, or why transaction 4773 would be blocking on transaction 4774. I see similar results when querying the pg_locks table: always a transaction waiting on a ShareLock of another transaction which is blocking on a tuple from the first transaction. Any suggestions for how to dig into the source of the ShareLock?

Update 3: I needed to update the LOCK_DEBUG_ENABLED inline function to unconditionally return true to see the ShareLock creations. Once I did that, I started seeing their creation:

2014-08-05 12:53:22 EDT apiary [76705] 2/471 6294 LOG:  00000: LockAcquire: lock [6294,0] ExclusiveLock
2014-08-05 12:53:22 EDT apiary [76705] 2/471 6294 LOG:  00000: LockAcquire: new: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0) type(ExclusiveLock)
2014-08-05 12:53:22 EDT apiary [76705] 2/471 6294 LOG:  00000: GrantLock: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock)
2014-08-05 12:53:22 EDT apiary [76706] 4/153 6295 LOG:  00000: LockAcquire: lock [6294,0] ShareLock
2014-08-05 12:53:22 EDT apiary [76706] 4/153 6295 LOG:  00000: LockAcquire: found: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ShareLock)
2014-08-05 12:53:22 EDT apiary [76706] 4/153 6295 LOG:  00000: WaitOnLock: sleeping on lock: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(80) req(0,0,0,0,1,0,1)=2 grant(0,0,0,0,0,0,1)=1 wait(0) type(ShareLock)

But I'm still not really sure why that ShareLock is being created, and why 6295 (in this case) must wait for 6294.

like image 201
Joe Shaw Avatar asked Aug 04 '14 16:08

Joe Shaw


People also ask

How do I stop deadlocks in PostgreSQL?

The first thing to do is to look at the postgres logs. postgres documentation recommends avoiding these kinds of deadlocks by ensuring transactions acquire locks in a consistent order. As before postgres will lock rows in the update table and the locks will be held until the transaction commits or rolls back.

What causes Postgres deadlock?

The reason is that transactions have to wait for one another. If two transactions are in a conflict, PostgreSQL will not resolve the problem immediately, rather it will wait for deadlock_timeout and then trigger the deadlock detection algorithm to resolve the problem.


2 Answers

This might be how the deadlock is happening. Each table has a foreign key on user_id to the user table. When you insert into a table that has a foreign key constraint, postgres needs to lock the row of the referenced table to ensure that it's not deleted while you insert the row that refers to it (and violate the FK contraints on commit). This should be a shared lock now.

It looks like all the insert/updates for tables that refer to user also update the user seq on the user table after the insert on the primary table. These updates require an exclusive lock and get blocked by any shared locks that are not part of the current transaction. If two happen concurrently, they deadlock.

For example, two transaction inserting into media_size and source concurrently might deadlock w/ something like this:

T1                                   T2
-----------------------------------------------------------------------
1. Insert media size
1a. Excl Lock media size row
1b. Shared Lock on user row (FK)
                                     2. Insert Source
                                     2a. Excl Lock source row
                                     2b. Shared lock on user row (FK)

3. Update user seq
3a. Excl Lock on user row (BLOCKS on 2b)
                                     4. Update user seq
                                     4a. Excl Lock on user row (Blocks on 1b)
5. Deadlock

I think switching the update user seq step to be first makes sense as it would force T1 and T2 to block before trying to acquire a shared lock (which it wouldn't need since it has an exclused lock already).

like image 73
Jason Wilder Avatar answered Sep 21 '22 00:09

Jason Wilder


Assuming the default transaction isolation level Read Committed.

The UPDATE statement always locks updated rows. Concurrent transactions trying to update the same row cannot proceed until the first transaction is rolled back or committed.

The RETURNING clause is orthogonal to the problem.

The deadlock would have happened without Process 5671 just as well. That's just another transaction queuing for the same row, that came in between. Process 5670 and Process 5652 are deadlocking, really, most probably due to other commands in the same transaction. A less likely candidate would be triggers on the table.

Try to break transactions into smaller pieces that update tables and rows of same tables in the same, straight order. Then they cannot interlock.

Foreign keys

Since you mention foreign keys in a later comment: those can play their part in a deadlock, too. Postgres 9.3 introduced new locking levels to address that:

FOR KEY SHARE and FOR NO KEY UPDATE

Details in this blog post by Michael Paquier.

Which should render FK constaints less of a problem. Still not ruled out, though.

Latest point release

There have been a number of small fixes to locking mechanisms since version 9.3.0. Upgrade to the latest point release, it might help.

View current locks

In reply to your comment: You can find (most) currently held locks in the system catalog view pg_locks. Be sure to read the manual and possibly more before drawing conclusions.

like image 39
Erwin Brandstetter Avatar answered Sep 20 '22 00:09

Erwin Brandstetter