I've been running into occasional deadlocks in my application because two transactions which need to update the same rows but in different orders (ex, transaction A updates rows X then Y, while transaction B updates rows Y then X).
For various Reasons, the traditional approaches to resolving avoiding this kind of deadlock – locking, or updating rows in a consistent order – are less than ideal.
Since the updates I'm trying to perform are otherwise idempotent and order-independent, is it safe and reasonable to simply catch these occasional deadlocks at the application level and retry the transaction?
For example:
def process_update(update):
attempt = 0
while attempt < 10:
try:
execute("SAVEPOINT foo")
for row in update:
execute("UPDATE mytable SET … WHERE …", row)
execute("RELEASE SAVEPOINT foo")
break
except Deadlock:
execute("ROLLBACK TO SAVEPOINT foo")
attempt += 1
raise Exception("Too many retries")
Is this a reasonable idea? Or are there costs associated with Postgres' deadlock detection that might make it dangerous?
I did a lot of research and experimentation into this for a system that's running 50 to 100 concurrent processes on the same tables. There are a number of transaction failures that can happen besides basic deadlocks. My case includes both read committed and serializable transactions. There's no situation where handling this at the application level caused any issues. Fortunately Postgres will fail immediately, so the only performance hit is to the application, nothing significant to the database.
The key components are catching every type of error, knowing which cases require a rollback, and having an exponential backoff for retries. I found that immediate retries or static sleep times cause processes to simply deadlock each other repeatedly and cause a bit of a domino effect, which makes sense.
This is the complete logic my system requires to handle every concurrency issue (pseudocode):
begin transaction (either read committed or serializable)
while not successful and count < 5
try
execute sql
commit
except
if error code is '40P01' or '55P03'
# Deadlock or lock not available
sleep a random time (200 ms to 1 sec) * number of retries
else if error code is '40001' or '25P02'
# "In failed sql transaction" or serialized transaction failure
rollback
sleep a random time (200 ms to 1 sec) * number of retries
begin transaction
else if error message is 'There is no active transaction'
sleep a random time (200 ms to 1 sec) * number of retries
begin transaction
increment count
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With