Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I include SELECTs in a transaction?

When using a database transaction to group multiple updates, should I include SELECTs inside the transaction as well? For instance, lets say I:

  1. get a record
  2. check edit permissions for that record, using data from the record
  3. update some records
  4. update some other records

Should I start the transaction before the "get a record" stage, or just around the updates?

I'm using Postgres/Django transaction.atomic() but I don't think it matters here.

like image 520
Scott Stafford Avatar asked Jul 02 '14 13:07

Scott Stafford


People also ask

Should we use transaction for select query?

In a highly concurrent application it could (theoretically) happen that data you've read in the first select is modified before the other selects are executed. If that is a situation that could occur in your application you should use a transaction to wrap your selects.

Why select * is not recommended?

Avoid using SELECT * There are many reasons for that recommendation, like: SELECT * Retrieves unnecessary data besides that it may increase the network traffic used for your queries. When you SELECT *, it is possible to retrieve two columns of the same name from two different tables (when using JOINS for example).

Is select statement is a transaction?

If all you are asking about is what the Isolation Level does, then understand that all Select statements (hey, all statements of any kind) - are in a transaction.

What is the purpose of using a select statement?

The SELECT statement is used to select data from a database. The data returned is stored in a result table, called the result-set.


2 Answers

The short version: "It depends".

The long version:

If you're doing a read-modify-write cycle, then not only must it be in a transaction, but you must SELECT ... FOR UPDATE any records you later intend to modify. Otherwise you're going to risk lost writes, where you overwrite an update someone else made between when you read the record and when you wrote the update.

SERIALIZABLE transaction isolation can also help with this.

You really need to understand concurrency and isolation. Unfortunately the only simple, easy "just do X" answer without understanding it is to begin every transaction by locking all the tables involved. Most people don't want to do that.

I suggest a read (or two, or three, or four - it's hard material) of the tx isolation docs. Experiment with concurrent psql sessions (multiple terminals) to create race conditions and conflicts.

like image 83
Craig Ringer Avatar answered Sep 28 '22 14:09

Craig Ringer


Ideally (if possible) you would do all of your four steps in a single data-modifying CTE (which automatically happens inside a single transaction).

That still does not rule out race conditions, just makes them very unlikely, because the time frame between SELECT .. FOR UPDATE and a later UPDATE is minimized. (Yes, you still should use FOR UPDATE (or another appropriate locking level) to counter race conditions under heavy concurrent access.)

This is not the typical (inefficient) approach of a web-framework like Django. But it's the superior approach. It optimizes performance in a number of ways:

  • Fewer round trips to the db server (probably most important)
  • Minimize lock times
  • Allow Postgres to optimize queries

When using SELECT .. FOR UPDATE in a data-modifying CTE, be aware that unreferenced CTEs are not executed at all, which would also not lock rows as intended.

Code examples for data-modifying CTEs:

  • How to use UPDATE in PostgreSQL with variable table?
  • Are SELECT type queries the only type that can be nested?

There are many more on SO. Try a seach.

like image 39
Erwin Brandstetter Avatar answered Sep 28 '22 14:09

Erwin Brandstetter