Please help me understand the use-case behind <code>SELECT ... FOR UPDATE</code>. Question 1: Is the following a good example of when <code>SELECT ... FOR UPDATE</code> should be used? Given: <ul> <li>rooms[id]</li> <li>tags[id, name]</li> <li>room_tags[room_id, tag_id] <ul> <li>room_id and tag_id are foreign keys</li> </ul> </li> </ul> The application wants to list all rooms and their tags, but needs to differentiate between rooms with no tags versus rooms that have been removed. If SELECT ... FOR UPDATE is not used, what could happen is: <ul> <li>Initially: <ul> <li>rooms contains <code>[id = 1]</code> </li> <li>tags contains <code>[id = 1, name = 'cats']</code> </li> <li>room_tags contains <code>[room_id = 1, tag_id = 1]</code> </li> </ul> </li> <li>Thread 1: <code>SELECT id FROM rooms;</code> <ul> <li><code>returns [id = 1]</code></li> </ul> </li> <li>Thread 2: <code>DELETE FROM room_tags WHERE room_id = 1;</code> </li> <li>Thread 2: <code>DELETE FROM rooms WHERE id = 1;</code> </li> <li>Thread 2: [commits the transaction]</li> <li>Thread 1: <code>SELECT tags.name FROM room_tags, tags WHERE room_tags.room_id = 1 AND tags.id = room_tags.tag_id;</code> <ul> <li>returns an empty list</li> </ul> </li> </ul> Now Thread 1 thinks that room 1 has no tags, but in reality the room has been removed. To solve this problem, Thread 1 should <code>SELECT id FROM rooms FOR UPDATE</code>, thereby preventing Thread 2 from deleting from <code>rooms</code> until Thread 1 is done. Is that correct? Question 2: When should one use <code>SERIALIZABLE</code> transaction isolation versus <code>READ_COMMITTED</code> with <code>SELECT ... FOR UPDATE</code>? Answers are expected to be portable (not database-specific). If that's not possible, please explain why.

The only portable way to achieve consistency between rooms and tags and making sure rooms are never returned after they had been deleted is locking them with <code>SELECT FOR UPDATE</code>. However in some systems locking is a side effect of concurrency control, and you achieve the same results without specifying <code>FOR UPDATE</code> explicitly. <hr> <blockquote> To solve this problem, Thread 1 should <code>SELECT id FROM rooms FOR UPDATE</code>, thereby preventing Thread 2 from deleting from <code>rooms</code> until Thread 1 is done. Is that correct? </blockquote> This depends on the concurrency control your database system is using. <ul> <li><code>MyISAM</code> in <code>MySQL</code> (and several other old systems) does lock the whole table for the duration of a query.</li> <li>In <code>SQL Server</code>, <code>SELECT</code> queries place shared locks on the records / pages / tables they have examined, while <code>DML</code> queries place update locks (which later get promoted to exclusive or demoted to shared locks). Exclusive locks are incompatible with shared locks, so either <code>SELECT</code> or <code>DELETE</code> query will lock until another session commits.</li> <li>In databases which use <code>MVCC</code> (like <code>Oracle</code>, <code>PostgreSQL</code>, <code>MySQL</code> with <code>InnoDB</code>), a <code>DML</code> query creates a copy of the record (in one or another way) and generally readers do not block writers and vice versa. For these databases, a <code>SELECT FOR UPDATE</code> would come handy: it would lock either <code>SELECT</code> or the <code>DELETE</code> query until another session commits, just as <code>SQL Server</code> does.</li> </ul> <blockquote> When should one use <code>REPEATABLE_READ</code> transaction isolation versus <code>READ_COMMITTED</code> with <code>SELECT ... FOR UPDATE</code>? </blockquote> Generally, <code>REPEATABLE READ</code> does not forbid phantom rows (rows that appeared or disappeared in another transaction, rather than being modified) <ul> <li>In <code>Oracle</code> and earlier <code>PostgreSQL</code> versions, <code>REPEATABLE READ</code> is actually a synonym for <code>SERIALIZABLE</code>. Basically, this means that the transaction does not see changes made after it has started. So in this setup, the last <code>Thread 1</code> query will return the room as if it has never been deleted (which may or may not be what you wanted). If you don't want to show the rooms after they have been deleted, you should lock the rows with <code>SELECT FOR UPDATE</code></li> <li>In <code>InnoDB</code>, <code>REPEATABLE READ</code> and <code>SERIALIZABLE</code> are different things: readers in <code>SERIALIZABLE</code> mode set next-key locks on the records they evaluate, effectively preventing the concurrent <code>DML</code> on them. So you don't need a <code>SELECT FOR UPDATE</code> in serializable mode, but do need them in <code>REPEATABLE READ</code> or <code>READ COMMITED</code>.</li> </ul> Note that the standard on isolation modes does prescribe that you don't see certain quirks in your queries but does not define how (with locking or with <code>MVCC</code> or otherwise). When I say "you don't need <code>SELECT FOR UPDATE</code>" I really should have added "because of side effects of certain database engine implementation".

When to use SELECT ... FOR UPDATE?

Tags:

sql

sql-server

mysql

transactions

select-for-update

Please help me understand the use-case behind SELECT ... FOR UPDATE.

Question 1: Is the following a good example of when SELECT ... FOR UPDATE should be used?

Given:

rooms[id]
tags[id, name]
room_tags[room_id, tag_id]
- room_id and tag_id are foreign keys

The application wants to list all rooms and their tags, but needs to differentiate between rooms with no tags versus rooms that have been removed. If SELECT ... FOR UPDATE is not used, what could happen is:

Initially:
- rooms contains [id = 1]
- tags contains [id = 1, name = 'cats']
- room_tags contains [room_id = 1, tag_id = 1]
Thread 1: SELECT id FROM rooms;
- returns [id = 1]
Thread 2: DELETE FROM room_tags WHERE room_id = 1;
Thread 2: DELETE FROM rooms WHERE id = 1;
Thread 2: [commits the transaction]
Thread 1: SELECT tags.name FROM room_tags, tags WHERE room_tags.room_id = 1 AND tags.id = room_tags.tag_id;
- returns an empty list

Now Thread 1 thinks that room 1 has no tags, but in reality the room has been removed. To solve this problem, Thread 1 should SELECT id FROM rooms FOR UPDATE, thereby preventing Thread 2 from deleting from rooms until Thread 1 is done. Is that correct?

Question 2: When should one use SERIALIZABLE transaction isolation versus READ_COMMITTED with SELECT ... FOR UPDATE?

Answers are expected to be portable (not database-specific). If that's not possible, please explain why.

490

asked Jun 07 '12 16:06

Gili

2 Answers

The only portable way to achieve consistency between rooms and tags and making sure rooms are never returned after they had been deleted is locking them with SELECT FOR UPDATE.

However in some systems locking is a side effect of concurrency control, and you achieve the same results without specifying FOR UPDATE explicitly.

To solve this problem, Thread 1 should SELECT id FROM rooms FOR UPDATE, thereby preventing Thread 2 from deleting from rooms until Thread 1 is done. Is that correct?

This depends on the concurrency control your database system is using.

MyISAM in MySQL (and several other old systems) does lock the whole table for the duration of a query.
In SQL Server, SELECT queries place shared locks on the records / pages / tables they have examined, while DML queries place update locks (which later get promoted to exclusive or demoted to shared locks). Exclusive locks are incompatible with shared locks, so either SELECT or DELETE query will lock until another session commits.
In databases which use MVCC (like Oracle, PostgreSQL, MySQL with InnoDB), a DML query creates a copy of the record (in one or another way) and generally readers do not block writers and vice versa. For these databases, a SELECT FOR UPDATE would come handy: it would lock either SELECT or the DELETE query until another session commits, just as SQL Server does.

When should one use REPEATABLE_READ transaction isolation versus READ_COMMITTED with SELECT ... FOR UPDATE?

Generally, REPEATABLE READ does not forbid phantom rows (rows that appeared or disappeared in another transaction, rather than being modified)

In Oracle and earlier PostgreSQL versions, REPEATABLE READ is actually a synonym for SERIALIZABLE. Basically, this means that the transaction does not see changes made after it has started. So in this setup, the last Thread 1 query will return the room as if it has never been deleted (which may or may not be what you wanted). If you don't want to show the rooms after they have been deleted, you should lock the rows with SELECT FOR UPDATE
In InnoDB, REPEATABLE READ and SERIALIZABLE are different things: readers in SERIALIZABLE mode set next-key locks on the records they evaluate, effectively preventing the concurrent DML on them. So you don't need a SELECT FOR UPDATE in serializable mode, but do need them in REPEATABLE READ or READ COMMITED.

Note that the standard on isolation modes does prescribe that you don't see certain quirks in your queries but does not define how (with locking or with MVCC or otherwise).

When I say "you don't need SELECT FOR UPDATE" I really should have added "because of side effects of certain database engine implementation".

179

answered Oct 18 '22 00:10

Quassnoi

Short answers:

Q1: Yes.

Q2: Doesn't matter which you use.

Long answer:

A select ... for update will (as it implies) select certain rows but also lock them as if they have already been updated by the current transaction (or as if the identity update had been performed). This allows you to update them again in the current transaction and then commit, without another transaction being able to modify these rows in any way.

Another way of looking at it, it is as if the following two statements are executed atomically:

select * from my_table where my_condition;

update my_table set my_column = my_column where my_condition;

Since the rows affected by my_condition are locked, no other transaction can modify them in any way, and hence, transaction isolation level makes no difference here.

Note also that transaction isolation level is independent of locking: setting a different isolation level doesn't allow you to get around locking and update rows in a different transaction that are locked by your transaction.

What transaction isolation levels do guarantee (at different levels) is the consistency of data while transactions are in progress.

answered Oct 18 '22 01:10

Colin 't Hart

Related questions
                            
                                How can I find non-ASCII characters in MySQL?
                            
                                B-Tree vs Hash Table
                            
                                How to make join queries using Sequelize on Node.js
                            
                                Unknown Column In Where Clause
                            
                                How do you debug MySQL stored procedures?
                            
                                MySQL: Fastest way to count number of rows
                            
                                mysql: see all open connections to a given database?
                            
                                How can I suppress column header output for a single SQL statement?
                            
                                Find the number of columns in a table
                            
                                MySQL: ignore errors when importing?
                            
                                MySQL: Enable LOAD DATA LOCAL INFILE
                            
                                MySQL table is marked as crashed and last (automatic?) repair failed
                            
                                What does collation mean?
                            
                                There can be only one auto column
                            
                                Create boolean column in MySQL with false as default value?
                            
                                Is there a way to "limit" the result with ELOQUENT ORM of Laravel?
                            
                                MySQL Results as comma separated list
                            
                                Using union and order by clause in mysql
                            
                                How do I get a raw, compiled SQL query from a SQLAlchemy expression?
                            
                                MySQL Query - Records between Today and Last 30 Days

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With