How can I avoid this three-way deadlock in Postgres?

Tags:

I am hitting a three-way deadlock in Postgres, and I don't really understand what is happening that is causing it. The log message is,

    Process 5671 waits for ExclusiveLock on tuple (33,12) of relation 31709 of database 16393; blocked by process 5652.
    Process 5652 waits for ShareLock on transaction 3382643; blocked by process 5670.
    Process 5670 waits for ExclusiveLock on tuple (33,12) of relation 31709 of database 16393; blocked by process 5671.
    Process 5671: UPDATE "user" SET
                    seqno = seqno + 1,
                    push_pending = true
                  WHERE user_id = $1
                  RETURNING seqno
    Process 5652: UPDATE "user" SET
                    seqno = seqno + 1,
                    push_pending = true
                  WHERE user_id = $1
                  RETURNING seqno
    Process 5670: UPDATE "user" SET
                    seqno = seqno + 1,
                    push_pending = true
                  WHERE user_id = $1
                  RETURNING seqno

(Relation 31709 is the "user" table. The user_id is the same in all three transactions.)

This doesn't seem like your run-of-the-mill deadlock you see demonstrated in the documentation. I am not updating multiple rows of this table out of order. I suspect the RETURNING clause has something to do with it, but I can't see exactly why. Any ideas on how I can solve this, or debug it further?

Update for Erwin's questions in the comments: This is Postgres 9.3. There are other commands in this transaction, although I don't believe they are touching the "user" table. There is one trigger on the table for updating an updated_at column with current_timestamp():

CREATE OR REPLACE FUNCTION update_timestamp() RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = now();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

I'll look into getting details for the transaction when I can reproduce this; I am analyzing the logs after the fact.

Update #2: I rebuilt Postgres with LOCK_DEBUG and ran with trace_locks=on in an attempt to wrap my head around the order in which locks are taken.

The updated deadlock message is:

Process 54131 waits for ShareLock on transaction 4774; blocked by process 54157.
Process 54157 waits for ExclusiveLock on tuple (1,16) of relation 18150 of database 18136; blocked by process 54131.

I can see the blocking on the ExclusiveLock pretty clearly:

2014-08-05 10:35:15 EDT apiary [54131] 2/316 4773 LOG:  00000: LockAcquire: new: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0) type(ExclusiveLock)
2014-08-05 10:35:15 EDT apiary [54131] 2/316 4773 LOG:  00000: GrantLock: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock)
2014-08-05 10:35:15 EDT apiary [54157] 3/188 4774 LOG:  00000: LockAcquire: found: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock)
2014-08-05 10:35:15 EDT apiary [54157] 3/188 4774 LOG:  00000: WaitOnLock: sleeping on lock: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(80) req(0,0,0,0,0,0,2)=2 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock)

(Format is date program [pid] virtualtxid txid msg)

But I don't see where the ShareLock is created, or why transaction 4773 would be blocking on transaction 4774. I see similar results when querying the pg_locks table: always a transaction waiting on a ShareLock of another transaction which is blocking on a tuple from the first transaction. Any suggestions for how to dig into the source of the ShareLock?

Update 3: I needed to update the LOCK_DEBUG_ENABLED inline function to unconditionally return true to see the ShareLock creations. Once I did that, I started seeing their creation:

2014-08-05 12:53:22 EDT apiary [76705] 2/471 6294 LOG:  00000: LockAcquire: lock [6294,0] ExclusiveLock
2014-08-05 12:53:22 EDT apiary [76705] 2/471 6294 LOG:  00000: LockAcquire: new: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0) type(ExclusiveLock)
2014-08-05 12:53:22 EDT apiary [76705] 2/471 6294 LOG:  00000: GrantLock: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock)
2014-08-05 12:53:22 EDT apiary [76706] 4/153 6295 LOG:  00000: LockAcquire: lock [6294,0] ShareLock
2014-08-05 12:53:22 EDT apiary [76706] 4/153 6295 LOG:  00000: LockAcquire: found: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ShareLock)
2014-08-05 12:53:22 EDT apiary [76706] 4/153 6295 LOG:  00000: WaitOnLock: sleeping on lock: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(80) req(0,0,0,0,1,0,1)=2 grant(0,0,0,0,0,0,1)=1 wait(0) type(ShareLock)

But I'm still not really sure why that ShareLock is being created, and why 6295 (in this case) must wait for 6294.

201

asked Aug 04 '14 16:08

Joe Shaw

2 Answers

This might be how the deadlock is happening. Each table has a foreign key on user_id to the user table. When you insert into a table that has a foreign key constraint, postgres needs to lock the row of the referenced table to ensure that it's not deleted while you insert the row that refers to it (and violate the FK contraints on commit). This should be a shared lock now.

It looks like all the insert/updates for tables that refer to user also update the user seq on the user table after the insert on the primary table. These updates require an exclusive lock and get blocked by any shared locks that are not part of the current transaction. If two happen concurrently, they deadlock.

For example, two transaction inserting into media_size and source concurrently might deadlock w/ something like this:

T1                                   T2
-----------------------------------------------------------------------
1. Insert media size
1a. Excl Lock media size row
1b. Shared Lock on user row (FK)
                                     2. Insert Source
                                     2a. Excl Lock source row
                                     2b. Shared lock on user row (FK)

3. Update user seq
3a. Excl Lock on user row (BLOCKS on 2b)
                                     4. Update user seq
                                     4a. Excl Lock on user row (Blocks on 1b)
5. Deadlock

I think switching the update user seq step to be first makes sense as it would force T1 and T2 to block before trying to acquire a shared lock (which it wouldn't need since it has an exclused lock already).

answered Sep 21 '22 00:09

Jason Wilder

Assuming the default transaction isolation level Read Committed.

The UPDATE statement always locks updated rows. Concurrent transactions trying to update the same row cannot proceed until the first transaction is rolled back or committed.

The RETURNING clause is orthogonal to the problem.

The deadlock would have happened without Process 5671 just as well. That's just another transaction queuing for the same row, that came in between. Process 5670 and Process 5652 are deadlocking, really, most probably due to other commands in the same transaction. A less likely candidate would be triggers on the table.

Try to break transactions into smaller pieces that update tables and rows of same tables in the same, straight order. Then they cannot interlock.

Foreign keys

Since you mention foreign keys in a later comment: those can play their part in a deadlock, too. Postgres 9.3 introduced new locking levels to address that:

FOR KEY SHARE and FOR NO KEY UPDATE

Details in this blog post by Michael Paquier.

Which should render FK constaints less of a problem. Still not ruled out, though.

Latest point release

There have been a number of small fixes to locking mechanisms since version 9.3.0. Upgrade to the latest point release, it might help.

View current locks

In reply to your comment: You can find (most) currently held locks in the system catalog view pg_locks. Be sure to read the manual and possibly more before drawing conclusions.

answered Sep 20 '22 00:09

Erwin Brandstetter

Related questions
                            
                                SQL getting SUM of row for each PlayerID
                            
                                Resolve Math Functions PL/SQL
                            
                                How to allow temporary tables to accept null values
                            
                                Using OUTPUT clause to insert value not in INSERTED
                            
                                MS SQL Server cross table constraint
                            
                                Executing Stored Procedure and returning row count back to code via output and sql parameters
                            
                                ORA-00932: inconsistent datatypes: expected DATE got BINARY in Hibernate
                            
                                Export DB with PostgreSQL's PgAdmin-III
                            
                                2 Outer Joins on Same Table?
                            
                                Precision Lost while Bulk Inserting records into SQL using SqlBulkCopy
                            
                                "database locked" error in ios while updating query
                            
                                Strange Error XML parsing: line 1, character 2038, illegal xml character
                            
                                Powershell - User Mapping SQL Server 2012
                            
                                Mapping a SQL View with no Primary Key to JPA Entity
                            
                                Unable to change my custom type in sql server
                            
                                Subtract Business Days from a Specific date
                            
                                StDev() function returns Null when table contains only one row
                            
                                Multi-language syntax highlighting in Emacs
                            
                                sqlalchemy: applying an SQL-like date() function on a datetime column
                            
                                Lazy Loading using MyBatis 3 with Java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With