Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL update changing multiple columns is non-atomic?

I'm having the following problem using Django with MySQL 5.5.22.

Given a table with columns id, level and a 2x2 matrix stored as a11, a12, a21, a22, I have this row:

id   a11   a12   a21   a22   level
324  3     2     5     3     2

Given a queryset qs, I do the following update:

qs.update(
    a11=(b12 * a21 - b11 * a22) * F('a11') + (b11 * a12 - b12 * a11) * F('a21'),
    a12=(b12 * a21 - b11 * a22) * F('a12') + (b11 * a12 - b12 * a11) * F('a22'),
    a21=(b22 * a21 - b21 * a22) * F('a11') + (b21 * a12 - b22 * a11) * F('a21'),
    a22=(b22 * a21 - b21 * a22) * F('a12') + (b21 * a12 - b22 * a11) * F('a22'),
    level=(F('level') - 1)
    )

For which django generates the following query (got it from db.connection.queries, remove the where clause for brevity):

UPDATE `storage` 
SET 
`a21` = (3 * `storage`.`a11`) + (-1 * `storage`.`a21`), 
`a22` = (3 * `storage`.`a12`) + (-1 * `storage`.`a22`), 
`level` = `storage`.`level` - -1, 
`a11` = (2 * `storage`.`a11`) + (-1 * `storage`.`a21`), 
`a12` = (2 * `storage`.`a12`) + (-1 * `storage`.`a22`) 

And my row looks like this after that:

id   a11   a12   a21   a22   level
324  2     1     4     3     1

For any row, a12*a21 - a11*a22 = 1 is supposed to be True, and according to that, the row was supposed to be:

id   a11   a12   a21   a22   level
324  1     1     4     3     1

This is what I get on SQLite, with Django generating the same query, and it took me a lot of time to figure that MySQL was doing something different. From the query, it seems like when updating interdepent multiple rows, MySQL doesn't treat it as a single atomic operation, and as columns are updated, they affect the values dependent on them. I confirmed this seems to be what happens by the following code on the Python prompt:

>>> a11, a12, a21, a22 = (3, 2, 5, 3)
>>> (2 * a11) + (-1 * a21),\
... (2 * a12) + (-1 * a22),\
... (3 * a11) + (-1 * a21),\
... (3 * a12) + (-1 * a22)
(1, 1, 4, 3)

If columns are updated one at a time, in the same order given by the query:

>>> a11, a12, a21, a22 = (3, 2, 5, 3)
>>> a21 = (3*a11) + (-1*a21)
>>> a22 = (3*a12) + (-1*a22)
>>> a11 = (2*a11) + (-1*a21)
>>> a12 = (2*a12) + (-1*a22)
>>> (a11, a12, a21, a22)
(2, 1, 4, 3)

This is really scary behavior, since this is a library meant to be used cross-platform. My questions are:

  1. Which one is doing it wrong, MySQL or SQLite? Can this be considered a bug?
  2. What can I expect from other major databases (Oracle, PostgreSQL and SQLServer)?
  3. What can I do with the Django ORM (no raw queries) to normalize this behavior?

edit

The problem is clear, but I'm still looking for a solution. Pulling all values and pushing them back is not an acceptable solution for this particular application.

like image 264
Pedro Werneck Avatar asked May 21 '12 21:05

Pedro Werneck


2 Answers

PostgreSQL, Oracle, and SQL Server all treat this as an atomic operation. See the following SQL Fiddle, and switch the server to see the behavior of the following SQL:

CREATE TABLE Swap (
  a CHAR(1),
  b CHAR(1)
);

INSERT INTO Swap (a, b) VALUES ('a', 'b');

UPDATE Swap SET a = b, b = a;

SELECT * FROM Swap;

MySQL was the only RBDMS that implements this with both columns containing the same value after the update.

As far as how you would resolve this, I would instead pull the values from the database, do the calculations inside of your application (instead of your update statement), and then update the database with the calculated values. This way you can guarantee that the calculation will be performed in a consistent manner.

like image 91
Michael Fredrickson Avatar answered Nov 16 '22 03:11

Michael Fredrickson


As stated in the MySQL manual:

The second assignment in the following statement sets col2 to the current (updated) col1 value, not the original col1 value. The result is that col1 and col2 have the same value. This behavior differs from standard SQL.

UPDATE t1 SET col1 = col1 + 1, col2 = col1;

Therefore, in your case, the value being used for a21 when evaluating the expression `a11` = (2 * `storage`.`a11`) + (-1 * `storage`.`a21`) is the new, updated, value of 4 rather than the original value of 5. As the manual says, this behaviour differs from standard SQL.

You could instead use a self-join with the multiple-table UPDATE syntax, however I don't know whether something like this can be implemented using the Django ORM:

UPDATE storage AS old
  JOIN storage AS new USING (id)
SET
  new.a21   = (3 * old.a11) + (-1 * old.a21),
  new.a22   = (3 * old.a12) + (-1 * old.a22),
  new.level = old.level - -1,
  new.a11   = (2 * old.a11) + (-1 * old.a21),
  new.a12   = (2 * old.a12) + (-1 * old.a22);

See it on sqlfiddle.

My only other thought (which should definitely be implementable within Django) is to split the update into separate parts, defining the fields updated in later parts in relation to the new (rather than the old) values of those fields which have been updated in earlier parts:

UPDATE storage
SET    a21   = (3 * a11) + (-1 * a21),
       a22   = (3 * a12) + (-1 * a22),
       level = level - -1;

UPDATE storage
SET    a11   = (2 * a11) + (-1 * (3*a11 - a21)),
       a12   = (2 * a12) + (-1 * (3*a12 - a22));

To prevent concurrency issues, you ought to perform these two updates within a transaction (if supported by the RDBMS).

like image 20
eggyal Avatar answered Nov 16 '22 01:11

eggyal