When using a database transaction to group multiple updates, should I include SELECTs inside the transaction as well? For instance, lets say I:
Should I start the transaction before the "get a record" stage, or just around the updates?
I'm using Postgres/Django transaction.atomic()
but I don't think it matters here.
In a highly concurrent application it could (theoretically) happen that data you've read in the first select is modified before the other selects are executed. If that is a situation that could occur in your application you should use a transaction to wrap your selects.
Avoid using SELECT * There are many reasons for that recommendation, like: SELECT * Retrieves unnecessary data besides that it may increase the network traffic used for your queries. When you SELECT *, it is possible to retrieve two columns of the same name from two different tables (when using JOINS for example).
If all you are asking about is what the Isolation Level does, then understand that all Select statements (hey, all statements of any kind) - are in a transaction.
The SELECT statement is used to select data from a database. The data returned is stored in a result table, called the result-set.
The short version: "It depends".
The long version:
If you're doing a read-modify-write cycle, then not only must it be in a transaction, but you must SELECT ... FOR UPDATE
any records you later intend to modify. Otherwise you're going to risk lost writes, where you overwrite an update someone else made between when you read the record and when you wrote the update.
SERIALIZABLE
transaction isolation can also help with this.
You really need to understand concurrency and isolation. Unfortunately the only simple, easy "just do X" answer without understanding it is to begin every transaction by locking all the tables involved. Most people don't want to do that.
I suggest a read (or two, or three, or four - it's hard material) of the tx isolation docs. Experiment with concurrent psql
sessions (multiple terminals) to create race conditions and conflicts.
Ideally (if possible) you would do all of your four steps in a single data-modifying CTE (which automatically happens inside a single transaction).
That still does not rule out race conditions, just makes them very unlikely, because the time frame between SELECT .. FOR UPDATE
and a later UPDATE
is minimized. (Yes, you still should use FOR UPDATE
(or another appropriate locking level) to counter race conditions under heavy concurrent access.)
This is not the typical (inefficient) approach of a web-framework like Django. But it's the superior approach. It optimizes performance in a number of ways:
When using SELECT .. FOR UPDATE
in a data-modifying CTE, be aware that unreferenced CTEs are not executed at all, which would also not lock rows as intended.
Code examples for data-modifying CTEs:
There are many more on SO. Try a seach.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With