Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL interprets SERIALIZABLE less strenuously than PostgreSQL. Is it correct?

When using SERIALIZABLE transactions to implement a pattern of inserting a value into a database only if it does not already exist, I have observed a significant difference between MySQL and PostgreSQL in their definition of the SERIALIZABLE isolation level.

Consider the following table:

CREATE TABLE person (
    person_id INTEGER PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR NOT NULL
);

And the following insertion code, run in concurrently on two connections:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION;
SELECT person_id FROM person WHERE name = 'Bob Ross';
-- sync point: both transactions should run through here before proceeding to
-- demonstrate the effect

-- 0 results, so we will insert
INSERT INTO person (name) VALUES ('Bob Ross');
SELECT last_insert_id();
COMMIT;

In PostgreSQL (after appropriate translation of the SQL), the effect is as I expect: only one of the transactions can successfully commit. This is consistent with my understanding of SERIALIZABLE as described by PostgreSQL, and other sources quoting from the ANSI standard: there exists a serial execution of the transactions that would produce the same effect. There is no serial execution of these two transactions that returns 0 results for the search and then adds the entry.

In MySQL 5.7, both transactions succeed and there are 2 ‘Bob Ross’ entries in the table. The MySQL documentation defines SERIALIZABLE in terms of prohibiting dirty reads, nonrepeatable reads, and phantom reads; it makes no reference to the existence of a serial execution.

SQLite also correctly pevents double insertion, at least in its default mode, due to its conservative locking strategy.

My question: Is MySQL's behavior in this case correct, or is it violating the SQL standard by allowing these transactions to both succeed? I suspect the answer may hinge on the definition of ‘effect’ — does observing an empty result set from the first SELECT count as an ‘effect’ for the purposes of two serial executions having the same effect?

A couple other comments to help scope this question:

  • I know I could achieve the desired behavior in MySQL by first doing an insert with ON CONFLICT IGNORE, and then doing the select. I am trying to understand why the equivalent standard SQL is not exhibiting the same behavior in both engines.
  • I know that I could probably also fix it by putting a unique constraint on the name field, which would arguably be a better data model anyway. But the core question still remains: why do these transactions both succeed?
like image 468
Michael Ekstrand Avatar asked Mar 21 '18 18:03

Michael Ekstrand


People also ask

What is serializable mysql?

SERIALIZABLE. SERIALIZABLE completely isolates the effect of one transaction from others. It is similar to REPEATABLE READ with the additional restriction that row selected by one transaction cannot be changed by another until the first transaction finishes. The phenomenon of phantom reads is avoided.

What is Postgres serializable?

Serializable provides the highest transaction isolation. This level emulates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently. However, applications using this level must be prepared to retry transactions due to serialization failures.

What is serializable read in SQL?

SERIALIZABLE is the strictest SQL transaction isolation level. While this isolation level permits transactions to run concurrently, it creates the effect that transactions are running in serial order. Transactions acquire locks for read and write operations.

What is transaction isolation level in PostgreSQL?

Read Committed is the default isolation level in PostgreSQL. When a transaction uses this isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions.


1 Answers

The SQL standard says in chapter 4.35.4 Isolation levels of SQL-transactions (emphasis mine):

The execution of concurrent SQL-transactions at isolation level SERIALIZABLE is guaranteed to be serializable. A serializable execution is defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions. A serial execution is one in which each SQL-transaction executes to completion before the next SQL-transaction begins.

A little further down, it goes on to confuse the issue:

The isolation level specifies the kind of phenomena that can occur during the execution of concurrent SQL-transactions. The following phenomena are possible:

[skipping definition of P1 (“Dirty read”), P2 (“Non-repeatable read”) and P3 (“Phantom”)]

The four isolation levels guarantee that each SQL-transaction will be executed completely or not at all, and that no updates will be lost. The isolation levels are different with respect to phenomena P1, P2, and P3. Table 8, “SQL-transaction isolation levels and the three phenomena” specifies the phenomena that are possible and not possible for a given isolation level.

+------------------+--------------+--------------+--------------+ 
| Level            | P1           | P2           | P3           |
+------------------+--------------+--------------+--------------+
| READ UNCOMMITTED | Possible     | Possible     | Possible     |
+------------------+--------------+--------------+--------------+
| READ COMMITTED   | Not Possible | Possible     | Possible     |
+------------------+--------------+--------------+--------------+
| REPEATABLE READ  | Not Possible | Not Possible | Possible     |
+------------------+--------------+--------------+--------------+
| SERIALIZABLE     | Not Possible | Not Possible | Not Possible |
+------------------+--------------+--------------+--------------+

NOTE 53 — The exclusion of these phenomena for SQL-transactions executing at isolation level SERIALIZABLE is a consequence of the requirement that such transactions be serializable.

This wording has had the unfortunate consequence that many implementors decided that it is enough to exclude direy reads, non-repeatable reads and phantom reads to correctly implement the SERIALIZABLE isolation level, even though the note clarifies that this is not the definition, but a consequence of the definition.

So I would argue that MySQL is wrong, but it is not alone: Oracle database interprets SERIALIZABLE in the same fashion.

like image 173
Laurenz Albe Avatar answered Sep 27 '22 02:09

Laurenz Albe