When using a database transaction to group multiple updates, should I include SELECTs inside the transaction as well? For instance, lets say I: <ol> <li>get a record</li> <li>check edit permissions for that record, using data from the record</li> <li>update some records</li> <li>update some other records</li> </ol> Should I start the transaction before the "get a record" stage, or just around the updates? I'm using Postgres/Django <code>transaction.atomic()</code> but I don't think it matters here.

The short version: "It depends". The long version: If you're doing a read-modify-write cycle, then not only must it be in a transaction, but you must <code>SELECT ... FOR UPDATE</code> any records you later intend to modify. Otherwise you're going to risk lost writes, where you overwrite an update someone else made between when you read the record and when you wrote the update. <code>SERIALIZABLE</code> transaction isolation can also help with this. You really need to understand concurrency and isolation. Unfortunately the only simple, easy "just do X" answer without understanding it is to begin every transaction by locking all the tables involved. Most people don't want to do that. I suggest a read (or two, or three, or four - it's hard material) of the tx isolation docs. Experiment with concurrent <code>psql</code> sessions (multiple terminals) to create race conditions and conflicts.

Should I include SELECTs in a transaction?

2 Answers

The short version: "It depends".

The long version:

If you're doing a read-modify-write cycle, then not only must it be in a transaction, but you must SELECT ... FOR UPDATE any records you later intend to modify. Otherwise you're going to risk lost writes, where you overwrite an update someone else made between when you read the record and when you wrote the update.

SERIALIZABLE transaction isolation can also help with this.

You really need to understand concurrency and isolation. Unfortunately the only simple, easy "just do X" answer without understanding it is to begin every transaction by locking all the tables involved. Most people don't want to do that.

I suggest a read (or two, or three, or four - it's hard material) of the tx isolation docs. Experiment with concurrent psql sessions (multiple terminals) to create race conditions and conflicts.

answered Sep 28 '22 14:09

Craig Ringer

Ideally (if possible) you would do all of your four steps in a single data-modifying CTE (which automatically happens inside a single transaction).

That still does not rule out race conditions, just makes them very unlikely, because the time frame between SELECT .. FOR UPDATE and a later UPDATE is minimized. (Yes, you still should use FOR UPDATE (or another appropriate locking level) to counter race conditions under heavy concurrent access.)

This is not the typical (inefficient) approach of a web-framework like Django. But it's the superior approach. It optimizes performance in a number of ways:

Fewer round trips to the db server (probably most important)
Minimize lock times
Allow Postgres to optimize queries

When using SELECT .. FOR UPDATE in a data-modifying CTE, be aware that unreferenced CTEs are not executed at all, which would also not lock rows as intended.

Code examples for data-modifying CTEs:

How to use UPDATE in PostgreSQL with variable table?
Are SELECT type queries the only type that can be nested?

There are many more on SO. Try a seach.

answered Sep 28 '22 14:09

Erwin Brandstetter

Related questions
                            
                                YII migrations and by default values for table columns
                            
                                Hibernate + embedded database - setup
                            
                                How to run query on Apache Jackrabbit, explain with example
                            
                                Automatic persistance of node.js objects in database
                            
                                Django default tables
                            
                                SQL Join two tables without Relations
                            
                                SQL sub-types with overlapping child tables
                            
                                How to design database that handle Order, OrderItem, Return, Refund, Exchange?
                            
                                Storing File Paths in a Database
                            
                                Does Postgres Replication (native) support per-database level replication?
                            
                                Database layout/design inefficient
                            
                                Design Patterns with database usage
                            
                                MySQL Type=MyISAM Error
                            
                                Backend for mobile app - which to use and why?
                            
                                MVC 5 Updating multiple fields of one row in database
                            
                                How To Fetch Random Row From MySQL Database Table Without Numeric ID?
                            
                                How to secure a SQL database from domain admins?
                            
                                Synchronize SQLite database and Dropbox datastore
                            
                                Is it possible to use Neo4j database in an Android application?
                            
                                How to coordinate J2EE and Java EE database access?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Should I include SELECTs in a transaction?

Tags:

database

postgresql

django

transactions

Scott Stafford

People also ask

2 Answers

Craig Ringer

Erwin Brandstetter

Recent Activity

Donate For Us