When using <code>SERIALIZABLE</code> transactions to implement a pattern of inserting a value into a database only if it does not already exist, I have observed a significant difference between MySQL and PostgreSQL in their definition of the <code>SERIALIZABLE</code> isolation level. Consider the following table: <pre class="prettyprint"><code>CREATE TABLE person ( person_id INTEGER PRIMARY KEY AUTO_INCREMENT, name VARCHAR NOT NULL ); </code></pre> And the following insertion code, run in concurrently on two connections: <pre class="prettyprint"><code>SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; START TRANSACTION; SELECT person_id FROM person WHERE name = 'Bob Ross'; -- sync point: both transactions should run through here before proceeding to -- demonstrate the effect -- 0 results, so we will insert INSERT INTO person (name) VALUES ('Bob Ross'); SELECT last_insert_id(); COMMIT; </code></pre> In PostgreSQL (after appropriate translation of the SQL), the effect is as I expect: only one of the transactions can successfully commit. This is consistent with my understanding of SERIALIZABLE as described by PostgreSQL, and other sources quoting from the ANSI standard: there exists a serial execution of the transactions that would produce the same effect. There is no serial execution of these two transactions that returns 0 results for the search and then adds the entry. In MySQL 5.7, both transactions succeed and there are 2 ‘Bob Ross’ entries in the table. The MySQL documentation defines <code>SERIALIZABLE</code> in terms of prohibiting dirty reads, nonrepeatable reads, and phantom reads; it makes no reference to the existence of a serial execution. SQLite also correctly pevents double insertion, at least in its default mode, due to its conservative locking strategy. My question: Is MySQL's behavior in this case correct, or is it violating the SQL standard by allowing these transactions to both succeed? I suspect the answer may hinge on the definition of ‘effect’ — does observing an empty result set from the first <code>SELECT</code> count as an ‘effect’ for the purposes of two serial executions having the same effect? A couple other comments to help scope this question: <ul> <li>I know I could achieve the desired behavior in MySQL by first doing an insert with <code>ON CONFLICT IGNORE</code>, and then doing the select. I am trying to understand why the equivalent standard SQL is not exhibiting the same behavior in both engines.</li> <li>I know that I could probably also fix it by putting a unique constraint on the <code>name</code> field, which would arguably be a better data model anyway. But the core question still remains: why do these transactions both succeed?</li> </ul>

The SQL standard says in chapter 4.35.4 Isolation levels of SQL-transactions (emphasis mine): <blockquote> The execution of concurrent SQL-transactions at isolation level SERIALIZABLE is guaranteed to be serializable. A serializable execution is defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions. A serial execution is one in which each SQL-transaction executes to completion before the next SQL-transaction begins. </blockquote> A little further down, it goes on to confuse the issue: <blockquote> The isolation level specifies the kind of phenomena that can occur during the execution of concurrent SQL-transactions. The following phenomena are possible: [skipping definition of P1 (“Dirty read”), P2 (“Non-repeatable read”) and P3 (“Phantom”)] The four isolation levels guarantee that each SQL-transaction will be executed completely or not at all, and that no updates will be lost. The isolation levels are different with respect to phenomena P1, P2, and P3. Table 8, “SQL-transaction isolation levels and the three phenomena” specifies the phenomena that are possible and not possible for a given isolation level. <pre class="prettyprint lang-none prettyprint-override"><code>+------------------+--------------+--------------+--------------+ | Level | P1 | P2 | P3 | +------------------+--------------+--------------+--------------+ | READ UNCOMMITTED | Possible | Possible | Possible | +------------------+--------------+--------------+--------------+ | READ COMMITTED | Not Possible | Possible | Possible | +------------------+--------------+--------------+--------------+ | REPEATABLE READ | Not Possible | Not Possible | Possible | +------------------+--------------+--------------+--------------+ | SERIALIZABLE | Not Possible | Not Possible | Not Possible | +------------------+--------------+--------------+--------------+ </code></pre> NOTE 53 — The exclusion of these phenomena for SQL-transactions executing at isolation level SERIALIZABLE is a consequence of the requirement that such transactions be serializable. </blockquote> This wording has had the unfortunate consequence that many implementors decided that it is enough to exclude direy reads, non-repeatable reads and phantom reads to correctly implement the <code>SERIALIZABLE</code> isolation level, even though the note clarifies that this is not the definition, but a consequence of the definition. So I would argue that MySQL is wrong, but it is not alone: Oracle database interprets <code>SERIALIZABLE</code> in the same fashion.

MySQL interprets SERIALIZABLE less strenuously than PostgreSQL. Is it correct?

Tags:

sql

mysql

postgresql

standards

transaction-isolation

When using SERIALIZABLE transactions to implement a pattern of inserting a value into a database only if it does not already exist, I have observed a significant difference between MySQL and PostgreSQL in their definition of the SERIALIZABLE isolation level.

Consider the following table:

Click to copy

CREATE TABLE person (
    person_id INTEGER PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR NOT NULL
);

And the following insertion code, run in concurrently on two connections:

Click to copy

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION;
SELECT person_id FROM person WHERE name = 'Bob Ross';
-- sync point: both transactions should run through here before proceeding to
-- demonstrate the effect

-- 0 results, so we will insert
INSERT INTO person (name) VALUES ('Bob Ross');
SELECT last_insert_id();
COMMIT;

In PostgreSQL (after appropriate translation of the SQL), the effect is as I expect: only one of the transactions can successfully commit. This is consistent with my understanding of SERIALIZABLE as described by PostgreSQL, and other sources quoting from the ANSI standard: there exists a serial execution of the transactions that would produce the same effect. There is no serial execution of these two transactions that returns 0 results for the search and then adds the entry.

In MySQL 5.7, both transactions succeed and there are 2 ‘Bob Ross’ entries in the table. The MySQL documentation defines SERIALIZABLE in terms of prohibiting dirty reads, nonrepeatable reads, and phantom reads; it makes no reference to the existence of a serial execution.

SQLite also correctly pevents double insertion, at least in its default mode, due to its conservative locking strategy.

My question: Is MySQL's behavior in this case correct, or is it violating the SQL standard by allowing these transactions to both succeed? I suspect the answer may hinge on the definition of ‘effect’ — does observing an empty result set from the first SELECT count as an ‘effect’ for the purposes of two serial executions having the same effect?

A couple other comments to help scope this question:

I know I could achieve the desired behavior in MySQL by first doing an insert with ON CONFLICT IGNORE, and then doing the select. I am trying to understand why the equivalent standard SQL is not exhibiting the same behavior in both engines.
I know that I could probably also fix it by putting a unique constraint on the name field, which would arguably be a better data model anyway. But the core question still remains: why do these transactions both succeed?

468

asked Mar 21 '18 18:03

Michael Ekstrand

1 Answers

The SQL standard says in chapter 4.35.4 Isolation levels of SQL-transactions (emphasis mine):

The execution of concurrent SQL-transactions at isolation level SERIALIZABLE is guaranteed to be serializable. A serializable execution is defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions. A serial execution is one in which each SQL-transaction executes to completion before the next SQL-transaction begins.

A little further down, it goes on to confuse the issue:

The isolation level specifies the kind of phenomena that can occur during the execution of concurrent SQL-transactions. The following phenomena are possible:

[skipping definition of P1 (“Dirty read”), P2 (“Non-repeatable read”) and P3 (“Phantom”)]

The four isolation levels guarantee that each SQL-transaction will be executed completely or not at all, and that no updates will be lost. The isolation levels are different with respect to phenomena P1, P2, and P3. Table 8, “SQL-transaction isolation levels and the three phenomena” specifies the phenomena that are possible and not possible for a given isolation level.

Click to copy
+------------------+--------------+--------------+--------------+ 
| Level            | P1           | P2           | P3           |
+------------------+--------------+--------------+--------------+
| READ UNCOMMITTED | Possible     | Possible     | Possible     |
+------------------+--------------+--------------+--------------+
| READ COMMITTED   | Not Possible | Possible     | Possible     |
+------------------+--------------+--------------+--------------+
| REPEATABLE READ  | Not Possible | Not Possible | Possible     |
+------------------+--------------+--------------+--------------+
| SERIALIZABLE     | Not Possible | Not Possible | Not Possible |
+------------------+--------------+--------------+--------------+
NOTE 53 — The exclusion of these phenomena for SQL-transactions executing at isolation level SERIALIZABLE is a consequence of the requirement that such transactions be serializable.

This wording has had the unfortunate consequence that many implementors decided that it is enough to exclude direy reads, non-repeatable reads and phantom reads to correctly implement the SERIALIZABLE isolation level, even though the note clarifies that this is not the definition, but a consequence of the definition.

So I would argue that MySQL is wrong, but it is not alone: Oracle database interprets SERIALIZABLE in the same fashion.

173

answered Sep 27 '22 02:09

Laurenz Albe

Related questions
                            
                                How can I combine 2 SQL queries and retrieve a cumulative count?
                            
                                MySql TIMESTAMP column is auto updated. Why?
                            
                                Order sum and order value is not fetching properly in mysql
                            
                                Mysql show no more than 5 concurrent connection
                            
                                How to get the overall most popular tags from normalized tables?
                            
                                Creating separate database connection for every celery worker
                            
                                Can't connect to MySQL docker container created via docker-compose
                            
                                mysql 5.7 is much slower than mysql 5.6 in medium sql
                            
                                Creating Users on RDS MySQL Read Replicas
                            
                                Is it recommended to run clustered database with Kubernetes in production environment?
                            
                                Doctrine: Relations between two entities in two databases
                            
                                How to accurately use aggregate functions when querying multiple tables?
                            
                                ERROR 1030 (HY000) at line 25: Got error 168 from storage engine
                            
                                Sonar setup with MySql
                            
                                docker-compose Unknown MySQL server host 'mysql' (airflow)
                            
                                Golang sql named parameter query
                            
                                Why is mySQL able to resolve these column aliases when normally one can't reuse an alias?
                            
                                Connect to MySQL/MariaDB with PHP PDO with --ssl without certificate
                            
                                nodejs- TypeError: connection.connect is not a function
                            
                                PDO::ERRMODE_EXCEPTION doesn't suppress warning

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With