I'm a little bit confused reading about PostgreSQL deadlocks.
A typical deadlock example is:
-- Transaction 1 UPDATE customer SET ... WHERE id = 1 UPDATE customer SET ... WHERE id = 2 -- Transaction 2 UPDATE customer SET ... WHERE id = 2 UPDATE customer SET ... WHERE id = 1
But what if I change the code as follows:
-- Transaction 1 UPDATE customer SET ... WHERE id IN (1, 2) -- Transaction 2 UPDATE customer SET ... WHERE id IN (1, 2)
Will be a possibility of deadlock here?
Essentially my question is: in the 2nd case does PostgreSQL lock rows one-by-one, or lock the entire scope covered by the WHERE
condition?
Thanks in advance!
Google BigQuery & PostgreSQL : Big Query for Data AnalysisRows modified by UPDATE and DELETE are then exclusively locked automatically for the duration of the transaction. This prevents other users from changing the row until the transaction is either committed or rolled back.
In PostgreSQL, when a transaction cannot acquire the requested lock within a certain amount of time (configured by `deadlock_timeout`, with default value of 1 second), it begins deadlock detection.
Update lock (U) is used to avoid deadlocks. Unlike the Exclusive lock, the Update lock places a Shared lock on a resource that already has another shared lock on it.
In PostgreSQL the rows will be locked as they are updated -- in fact, the way this actually works is that each tuple (version of a row) has a system field called xmin
to indicate which transaction made that tuple current (by insert or update) and a system field called xmax
to indicate which transaction expired that tuple (by update or delete). When you access data, it checks each tuple to determine whether it is visible to your transaction, by checking your active "snapshot" against these values.
If you are executing an UPDATE and a tuple which matches your search conditions has an xmin which would make it visible to your snapshot and an xmax of an active transaction, it blocks, waiting for that transaction to complete. If the transaction which first updated the tuple rolls back, your transaction wakes up and processes the row; if the first transaction commits, your transaction wakes up and takes action depending on the current transaction isolation level.
Obviously, a deadlock is the result of this happening to rows in different order. There is no row-level lock in RAM which can be obtained for all rows at the same time, but if rows are updated in the same order you can't have the circular locking. Unfortunately, the suggested IN(1, 2)
syntax doesn't guarantee that. Different sessions may have different costing factors active, a background "analyze" task may change statistics for the table between the generation of one plan and the other, or it may be using a seqscan and be affected by the PostgreSQL optimization which causes a new seqscan to join one already in progress and "loop around" to reduce disk I/O.
If you do the updates one at a time in the same order, in application code or using a cursor, then you will have only simple blocking, not deadlocks. In general, though, relational databases are prone to serialization failures, and it is best to access them through a framework which will recognize them based on SQLSTATE and automatically retry the entire transaction from the start. In PostgreSQL a serialization failure will always have a SQLSTATE of 40001 or 40P01.
http://www.postgresql.org/docs/current/interactive/mvcc-intro.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With