Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database Transactions: Difference between 'write skew' and 'lost update'

Can somebody explain me whats exactly the difference between a 'write skew' and a 'lost update' in database transaction theory? Can somebody give me an example?

like image 346
MjeOsX Avatar asked Jan 07 '15 19:01

MjeOsX


1 Answers

Informally, lost updates and write skew are ways that concurrent write transactions can interfere with each other.

Write skew happens when an update is made within a transaction based upon stale data. Stale data is a value read by a transaction that has become stale due to a subsequent committed write from a concurrent transaction.

Lost updates happen when a committed value written by one transaction is overwritten by a subsequent committed write from a concurrent transaction. In fact, lost update is really a special case of write skew; where updates are applied to data that has become stale.

Consider the case where a database for a retail store maintains an Inventory table. The database does not implement transaction isolation.

The Inventory table has a "ProductId" column and an "InStock" column that counts the number of items that are currently in stock for a particular product. Each purchase (transaction) decrements the "InStock" value by the number of items purchased.

Imagine that the store has two electric shavers (of a specific model) in stock.

Two customers each purchase one of these shavers, simultaneously.

Each of the concurrent purchases (transactions) reads the same value (two) from the shaver's "InStock" record. The transactions each decrement the "InStock" counter and commit the updated value (one) to the database. After both of the concurrent transactions have committed, the counter will incorrectly indicate that the shaver is still in stock (one item remaining).

One of the updates was lost.

Suppose the database implements snapshot Isolation (with lost update detection), in this case lost updates don't happen. This is because snapshot isolation detects when a lost update has occurred. After a transaction commits data, concurrent transactions that attempt to commit writes for the same data are aborted by the database. In our example, the process for which the transaction is aborted starts a new transaction to re-read the "InStock" column, decrement it, and commit the updated value. Assuming no other conflicts, this attempt to update the record commits successfully and the "Instock" column contains the (correct) value zero.

Transaction isolation is a deep topic.

Furthermore assume that the database records inventory history in an InventoryHistory table. The InventoryHistory table has the columns "Timestamp", "ProductId", and "InStock" (remaining after purchase). By design, the update to the InventoryHistory table is the last operation in a purchase transaction. After the two transactions commit, the respective InventoryHistory records will each reflect an "Instock" value of one -- this is incorrect, since one of the records should reflect an "Instock" value of zero. The incorrect InventoryHistory record is an example of write skew.

In this case snapshot isolation did not prevent anomalous data from being written to the database, since no updates were lost. Rather, the data written was anomalous because a value that was read by the transaction had become stale -- this is write skew. Snapshot isolation does not prevent write skew. To prevent write skew, the database must implement serializable isolation.

Read this article for a rigorous discussion of write skew, serializability and snapshot isolation.

like image 58
Joel Stevick Avatar answered Sep 19 '22 17:09

Joel Stevick