Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly use transactions and locks to ensure database integrity?

I develop an online reservation system. To simplify let's say that users can book multiple items and each item can be booked only once. Items are first added to the shopping cart.

App uses MySql / InnoDB database. According to MySql documentation, default isolation level is Repeatable reads.

Here is the checkout procedure I've came up with so far:

  1. Begin transaction
  2. Select items in the shopping cart (with for update lock)
    Records from cart-item and items tables are fetched at this step.
  3. Check if items haven't been booked by anybody else
    Basically check if quantity > 0. It's more complicated in the real application, thus I put it here as a separate step.
  4. Update items, set quantity = 0
    Also perform other essential database manipulations.
  5. Make payment (via external api like PayPal or Stripe)
    No user interaction is necessary as payment details can be collected before checkout.
  6. If everything went fine commit transaction or rollback otherwise
  7. Continue with non-essential logic
    Send e-mail etc in case of success, redirect for error.

I am unsure if that is sufficient. I'm worried whether:

  1. Other user that tries to book same item at the same time will be handled correcly. Will his transaction T2 wait until T1 is done?
  2. Payment using PayPal or Stripe may take some time. Wouldn't this become a problem in terms of performance?
  3. Items availability will be shown correctly all the time (items should be available until checkout succeeds). Should these read-only selects use shared lock?
  4. Is it possible that MySql rollbacks transaction by itself? Is it generally better to retry automatically or display an error message and let user try again?
  5. I guess its enough if I do SELECT ... FOR UPDATE on items table. This way both request caused by double click and other user will have to wait till transaction finishes. They'll wait because they also use FOR UPDATE. Meanwhile vanilla SELECT will just see a snapshot of db before the transaction, with no delay though, right?
  6. If I use JOIN in SELECT ... FOR UPDATE, will records in both tables be locked?
  7. I'm a bit confused about SELECT ... FOR UPDATE on non-existent rows section of Willem Renzema answer. When may it become important? Could you provide any example?

Here are some resources I've read: How to deal with concurrent updates in databases?, MySQL: Transactions vs Locking Tables, Do database transactions prevent race conditions?, Isolation (database systems), InnoDB Locking and Transaction Model, A beginner’s guide to database locking and the lost update phenomena.

Rewrote my original question to make it more general.
Added follow-up questions.

like image 764
Paul Avatar asked Nov 22 '16 19:11

Paul


People also ask

How does the SQL Server database engine ensure the integrity of transactions?

The SQL Server Database Engine uses the following mechanisms to ensure the integrity of transactions and maintain the consistency of databases when multiple users are accessing data at the same time: Each transaction requests locks of different types on the resources, such as rows, pages, or tables, on which the transaction is dependent.

How do you ensure data integrity in a database?

Locks and locking mechanisms, like the two-phased locking mechanism, are some of the measures available to help preserve data integrity. Data integrity simply means that the accuracy and correctness of the data contained in a database must be preserved, and consistent data/information must always be provided.

Why do we use a lock in a transaction?

However, transaction used locks to help it to follow the ACID principles. If you want to the table to prevent others to read/write at the same time point while you are read/write, you need a lock to do this. If you want to make sure the data integrity and consistence, you had better use transactions.

Does the isolation level affect the locks acquired to protect data modifications?

Choosing a transaction isolation level does not affect the locks acquired to protect data modifications. A transaction always gets an exclusive lock on any data it modifies, and holds that lock until the transaction completes, regardless of the isolation level set for that transaction.


2 Answers

  1. Begin transaction
  2. Select items in shopping cart (with for update lock)

So far so good, this will at least prevent the user from doing checkout in multiple sessions (multiple times trying to checkout the same card - good to deal with double clicks.)

  1. Check if items haven't been booked by other user

How do you check? With a standard SELECT or with a SELECT ... FOR UPDATE? Based on step 5, I'm guessing you are checking a reserved column on the item, or something similar.

The problem here is that the SELECT ... FOR UPDATE in step 2 is NOT going to apply the FOR UPDATE lock to everything else. It is only applying to what is SELECTed: the cart-item table. Based on the name, that is going to be a different record for each cart/user. This means that other transactions will NOT be blocked from proceeding.

  1. Make payment
  2. Update items marking them as reserved
  3. If everything went fine commit transaction, rollback otherwise

Following the above, based on the information you've provided, you may end up with multiple people buying the same item, if you aren't using SELECT ... FOR UPDATE on step 3.

Suggested Solution

  1. Begin transaction
  2. SELECT ... FOR UPDATE the cart-item table.

This will lock a double click out from running. What you select here should be the some kind of "cart ordered" column. If you do this, a second transaction will pause here and wait for the first to finish, and then read the result what the first saved to the database.

Make sure to end the checkout process here if the cart-item table says it has already been ordered.

  1. SELECT ... FOR UPDATE the table where you record if an item has been reserved.

This will lock OTHER carts/users from being able to read those items.

Based on the result, if the items are not reserved, continue:

  1. UPDATE ... the table in step 3, marking the item as reserved. Do any other INSERTs and UPDATEs you need, as well.

  2. Make payment. Issue a rollback if the payment service says the payment didn't work.

  3. Record payment, if success.

  4. Commit transaction

Make sure you don't do anything that might fail between steps 5 and 7 (like sending emails), else you may end up with them making a payment without it being recorded, in the event the transaction gets rolled back.

Step 3 is the important step with regards to making sure two (or more) people don't try to order the same item. If two people do try, the 2nd person will end up having their webpage "hang" while it processes the first. Then when the first finishes, the 2nd will read the "reserved" column, and you can return a message to the user that someone has already purchased that item.

Payment in transaction or not

This is subjective. Generally, you want to close transactions as quickly as possible, to avoid multiple people being locked out from interacting with the database at once.

However, in this case, you actually do want them to wait. It's just a matter of how long.

If you choose to commit the transaction before payment, you'll need to record your progress in some intermediate table, run the payment, and then record the result. Be aware that if the payment fails, you'll then have to manually undo the item reservation records that you updated.

SELECT ... FOR UPDATE on non-existent rows

Just a word of warning, in case your table design involves inserting rows where you need to earlier SELECT ... FOR UPDATE: If a row doesn't exist, that transaction will NOT cause other transactions to wait, if they also SELECT ... FOR UPDATE the same non-existent row.

So, make sure to always serialize your requests by doing a SELECT ... FOR UPDATE on a row that you know exists first. Then you can SELECT ... FOR UPDATE on the row that may or may not exist yet. (Don't try to do just a SELECT on the row that may or may not exist, as you'll be reading the state of the row at the time the transaction started, not at the moment you run the SELECT. So, SELECT ... FOR UPDATE on non-existent rows is still something you need to do in order to get the most up to date information, just be aware it will not cause other transactions to wait.)

like image 63
Willem Renzema Avatar answered Oct 21 '22 15:10

Willem Renzema


1. Other user that tries to book same item at the same time will be handled correcly. Will his transaction T2 wait until T1 is done?

Yes. While active transaction keeps FOR UPDATE lock on a record, statements in other transactions that use any lock (SELECT ... FOR UPDATE, SELECT ... LOCK IN SHARE MODE, UPDATE, DELETE) will be suspended untill either active transaction commits or "Lock wait timeout" is exceeded.

2. Payment using PayPal or Stripe may take some time. Wouldn't this become a problem in terms of performance?

This will not be a problem, as this is exactly what is necessary. Checkout transactions should be executed sequentially, ie. latter checkout should not start before former finish.

3. Items availability will be shown correctly all the time (items should be available until checkout succeeds). Should these read-only selects use shared lock?

Repeatable reads isolation level ensures that changes made by a transaction are not visible until that transaction is commited. Therefore items availability will be displayed correctly. Nothing will be shown unavailable before it is actually paid for. No locks are necessary.

SELECT ... LOCK IN SHARE MODE would cause checkout transaction to wait until it is finished. This could slow down checkouts without giving any payoff.

4. Is it possible that MySql rollbacks transaction by itself? Is it generally better to retry automatically or display an error message and let user try again?

It is possible. Transaction may be rolled back when "Lock wait timeout" is exceeded or when deadlock happens. In that case it would be a good idea to retry it automatically.
By default suspended statements fail after 50s.

5. I guess its enough if I do SELECT ... FOR UPDATE on items table. This way both request caused by double click and other user will have to wait till transaction finishes. They'll wait because they also use FOR UPDATE. Meanwhile vanilla SELECT will just see a snapshot of db before the transaction, with no delay though, right?

Yes, SELECT ... FOR UPDATE on items table should be enough.
Yes, these selects wait, because FOR UPDATE is an exclusive lock.
Yes, simple SELECT will just grab value as it was before transaction started, this will happen immediately.

6. If I use JOIN in SELECT ... FOR UPDATE, will records in both tables be locked?

Yes, SELECT ... FOR UPDATE, SELECT ... LOCK IN SHARE MODE, UPDATE, DELETE lock all read records, so whatever we JOIN is included. See MySql Docs.

What's interesting (at least for me) everything that is scanned in the processing of the SQL statement gets locked, no matter wheter it is selected or not. For example WHERE id < 10 would lock also the record with id = 10!

If you have no indexes suitable for your statement and MySQL must scan the entire table to process the statement, every row of the table becomes locked, which in turn blocks all inserts by other users to the table. It is important to create good indexes so that your queries do not unnecessarily scan many rows.

like image 45
Paul Avatar answered Oct 21 '22 14:10

Paul