We have an app that is using psycopg2 to write records to RDS Postgres. Occasionally when a scale-down event occurs and the container stops during an insert commit, this creates a deadlock on the table. We are using a threaded connection pool with some standard timeouts as seen below:
self._pool = pool.ThreadedConnectionPool(
mincount,
maxcount,
dsn,
cursor_factory=cursor_factory,
application_name=application_name or name,
keepalives_idle=1,
keepalives_interval=1,
keepalives_count=5,
options=f"-c statement_timeout={statement_timeout}s -c idle_in_transaction_session_timeout={idle_in_transaction_session_timeout}s",
The idle in transaction timeout seems to be working as transactions occurring after throw a timeout error rather than waiting silently, but we're still getting lock issues. Is there a different timeout we should use to have Postgres kill these transactions?
Update on issue:
We have 2 different applications. One that writes to the table and another that reads from it. We're see this error occasionally pop up in the write application:
deadlock detected
DETAIL: Process 31504 waits for ShareLock on transaction 33994594; blocked by process 28310.
Process 28310 waits for ShareLock on transaction 33994595; blocked by process 31504.
HINT: See server log for query details.
CONTEXT: while inserting
Which if I pull the pg_stat_activity for those pids, I get this:
[
{
"datid": 262668,
"datname": "app_db",
"pid": 31504,
"usename": "app",
"application_name": "app-Writer",
"query_start": "2020-10-28 23:16:23.859818",
"state_change": "2020-10-28 23:16:23.865455",
"wait_event_type": "Client",
"wait_event": "ClientRead",
"state": "idle",
"backend_xid": null,
"backend_xmin": null,
"query": "COMMIT",
"backend_type": "client backend"
},
{
"datid": 262668,
"datname": "app_db",
"pid": 28310,
"usename": "app",
"application_name": "app-Writer",
"query_start": "2020-10-28 23:12:01.232097",
"state_change": "2020-10-28 23:12:01.234281",
"wait_event_type": "Client",
"wait_event": "ClientRead",
"state": "idle",
"backend_xid": null,
"backend_xmin": null,
"query": "COMMIT",
"backend_type": "client backend"
}
]
The reader app later fails with this error:
psycopg2.InternalError: terminating connection due to idle-in-transaction timeout
SSL connection has been closed unexpectedly
Both the reader and writer apps have the same timeout settings.
The default connection timeout limit to PostgreSQL data sources is 30 seconds.
PostgreSQL commit is used to save the transaction changes to the database, which the user made. The default value of commit is ON in PostgreSQL, which means we need not have to execute a commit statement to save the transaction; it will automatically save the transaction into the database.
idle_in_transaction_session_timeout is a configuration parameter determining the length of time after which sessions with open transactions are terminated. It is disabled by default. idle_in_transaction_session_timeout was added in PostgreSQL 9.6.
sslrootcert. This parameter specifies the name of a file containing SSL certificate authority ( CA ) certificate(s). If the file exists, the server's certificate will be verified to be signed by one of these authorities.
First, if the deadlock occurs only rarely, don't worry too much: all you have to do is teach your application to repeat the transaction if it encounters the deadlock. Read on if you need to get rid of the deadlock.
The COMMIT
you are seeing in pg_stat_activity
is a red herring: the query
column contains the last statement that was sent on that connection, and that probably was the COMMIT
that ended the transaction after the deadlock happened.
Readers and writers never block each other in PostgreSQL, so the deadlock must be between two data modifying transactions.
You should do what the error message tells you and consult the PostgreSQL log file. There you find more information, in particular the statements that were being executed when the deadlock happened. This information is not sent to the client because it may contain sensitive data.
To debug the problem, you have to consider all the statements that were executed in these transactions, because it may well be that earlier statements in the transaction took locks that contributed to the deadlock. Remember that locks are held until the end of the transaction.
If you cannot identify the transactions and what they did from your application code, you could set log_statement = 'all'
in PostgreSQL and make sure that the transaction ID (%x
) is included in log_prefix
. That will cause all statements to be logged (beware of performance problems), and when the error happens, you can find all the statements that belong to the involved transactions in the log.
This is cumbersome, but the only way if you cannot find the statements from your application end.
Once you know the statements, you can reproduce and debug the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With