Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Deadlock question

Is it possible in relational databases for these two statements to deadlock? I'm trying to simplify my question and example -- please just assume that these selects, which I think would normally only require sharable read-locking, now require exclusive read locks:

Concurrent Connection 1:
SELECT {...}
FROM A 
JOIN B ON {...}

Concurrent Connection 2:
SELECT {...}
FROM B 
JOIN A ON {...}

That is, does the ordering of the joins matter? Are single statements in SQL atomic? Is A locked first and then B in the first statement and B locked first and then A in the second statement?

I think not - My gut tells me that two single statements like this cannot deadlock, no matter how complex. I believe that a statement is analyzed as a whole and that the resources requiring locking are locked using some deterministic global order (i.e. alphabetically). But I need more than a gut feeling on this - I can't think of a way to prove it and I can't find it documented.

I'm interested in MS SQL 2005, but I don't think the question is implementation specific.

Secondarily: As it relates to MS SQL, I'd also want to know that Common Table Expressions also have this guarantee - that CTEs are mostly a syntactic benefit (+recursion), consolidated into a traditional single statement by the engine.

like image 633
Jason Kleban Avatar asked Jan 22 '23 18:01

Jason Kleban


1 Answers

SELECTs cannot deadlock with other SELECT, because they only acquire shared locks. You say that we should consider that these SELECTs now 'require exclusive read locks', but this is not possible for us to consider because 1) there is no such thing as an exlusive read lock and 2) reads don't acquire exclusive locks.

But you do pose a more general question, whether simple statements can deadlock. The answer is a definite, resounding YES. Locks are acquired at execution, not analyzed upfront and sorted then acquired in some order. It would be impossible for the engine to know upfront the needed locks because they depend on the actual data in on-disk, and to read the data the engine needs to ... lock the data.

Deadlocks between simple statements (SELECt vs. UPDATE or SELECT vs. DELETE) due to different index access order are quite common and very easy to investigate, diagnose and fix. But note that there is always a write operation involved, as reads cannot block each other. For this discussion, adding a UPDLOCK or XLOCK hint to a SELECT should be considered a write. You don't even need a JOIN, a secondary index may well introduce the access order problem leading to deadlock, see Read/Write Deadlock.

And finally, writing SELECT FROM A JOIN B or writing SELECT FROM B JOIN A is completely irrelevant. The query optimizer is free to rearrange the access order as it sees fit, the actual text of the query does not impose the order of execution in any way.

Updated

How then can we construct a general strategy toward a READ COMMITTED "multiple entity" database that doesn't deadlock?

I'm afraid there is no cookie-cutter recipe. The solution will depend from case to case. Ultimately, in database applications deadlocks are a fact of life. I understand this may sound absurd, as in 'we landed on the Moon but we can't write a correct database application', but there are strong factors at play which pretty much guarantee that applications will eventually encounter deadlocks. Lucky deadlocks are the easiest to deal with errors, simple read again the state, apply the logic, re-write the new state. Now that being said, there are some good practices that can dramatically reduce the frequency of deadlocks, down to the point they are all but vanished:

  • Try to have a consistent access pattern for Writes. Have clearly defined rules stating things such as 'a transaction shall always tables in this order: Customers -> OrderHeaders -> OrderLines.' Note that the order has to be obeyed inside a transaction. Basically, rank all tables in your schema and specify that all updates must occur in ranking order. This eventually boils down to code discipline of the individual contributor writing the code, as it has to ensure it writes is update sin the proper order inside a transaction.
  • Reduce the duration of writes. The usual wisdom goes as this: at the beginning of the transaction do all the reads (read the existing state), then process the logic and compute new values, then write all updates at the end of transaction. Avoid a pattern like 'read->write->logic->read->write', instead do 'read->read->logic->write->write'. Of course, the true craftsmanship consist in how to deal with actual, real, individual cases when apparently one must have to do writes mid-transaction. A special note here must be said about a specific type of transaction: those driven by a queue, which by very definition start their activity by dequeueing (= a write) from the queue. These applications were always notoriously difficult to write and prone to errors (specially deadlocks), luckily there are ways to do it, see Using tables as Queues.
  • Reduce the amount of reads. Table scans are the most prevalent cause of deadlocks. Proper indexing will not only eliminate the deadlocks, but may also boost performance in the process.
  • Snapshot isolation. This is the closest thing you'll get to a free lunch in regard to avoiding deadlocks. I intentionally put it last, because it may mask other problems (like improper indexing) instead of fixing them.

Trying to solve this problem with a LockCustomerByXXX approach I'm afraid doesn't work. Pessimistic locking doesn't scale. Optimistic concurrency updates are the way to go if you want to have any sort of decent performance.

like image 137
Remus Rusanu Avatar answered Jan 28 '23 19:01

Remus Rusanu