I am working on a REST API for a web service using Pyramid and Cornice; data on the server side is handled using SQLAlchemy and MySQL. The web server is nginx using uwsgi, and it's configured to run multiple Python processes:
[uwsgi]
socket = localhost:6542
plugins = python34
...
processes = 2 # spawn the specified number of workers/processes
threads = 2 # run each worker in prethreaded mode with the specified number of threads
Problem
Suppose a table customers
on the server side. Using the API one can read customer data, modify it, or delete it. In addition to that there are other API functions which read customer data.
I could issue multiple API calls at the same time which then compete for the same customer resource:
# Write/modify the customer {id} data
curl --request POST ... https://some.host/api/customer/{id}
# Delete customer {id} and all of its associated data
curl --request DELETE https://some.host/api/customer/{id}
# Perform some function which reads customer {id}
curl --request GET ... https://some.host/api/do-work
Essentially this is a Readers-Writers Problem, but because more than one process is involved, traditional thread synchronization using locks/mutexes/semaphores won't work here.
Question
I'd like to understand the best way to implement locking and synchronization for such a Pyramid based web API, such that concurrent calls like in the above example are handled safely and efficiently (i.e. without unnecessary serializing).
Solutions (?)
{id}
as locked
because SQLAlchemy caches such modifications, and flush()
doesn't seem atomic enough in this context?I assume you are dealing with one MySQL database and your locks do not need to cover other resources (Redis, third party APIs, etc.). I also assume your client side functions do not itself need to work on transactions data (maintain a session over several API calls), you just want to prevent concurrent API access to messing up your database.
There are two kinds of locking, pessimistic locking and optimistic locking.
Pessimistic locking is what most people usually know by locking - you create and acquire locks beforehand, programmatically in code. This is what distributed lock manager is.
Optimistic locking is what you can quite easily get away with SQL databases. If two transactions compete from the same resource, database effectively dooms one of the transactions and the application framework (in this case Pyramid + pyramid_tm) can retry the transaction N times before giving up.
The optimistic locking is more ideal solution from the development point of view, as it does not put any cognitive load on the application developer to remember to lock resources correctly or create in-house locking mechanisms. Instead, the developer relies on framework and database to retry and manage concurrency situations. However, optimistic locking is not that well known among web developers, because doing optimistic locking in widespread PHP'esque environments is difficult due to lack of flexibility in the programming language.
pyramid_tm
implements optimistic locking solution and I would recommend you to use it or some other optimistic locking solution, unless you know a very specific reason you don't want to.
pyramid_tm
ties transaction life cycle to HTTP request, very natural from the web developer point of view
pyramid_tm
can tie other events to successful transactions, e.g. pyramid_mailer
sends out email to users only if the transactions commit
pyramid_tm
is well tested and based on ZODB transaction
transaction manager, which has been in production usage since early 2000
Make sure your SQLAlchemy session is set to SERIALIZABLE SQL isolation level - you start with the highest consistency model. You can lower this requirement for performance if you know API calls tolerate it - e.g. calls doing statistics read only analysis.
Optimistic locking usually performs better in "normal" lots of reads - few writes workloads where it is rare that a conflict raises (two API calls update the same user once). The transaction retry penalty hits only if there is a conflict.
If the transaction ultimately fail after N retries, e.g. under unusual high load situation, this should be resolved on the API consumer side telling that the server side data has changed and the user must verify or refill the form again
Further reading
Optimistic concurrency control in Wikipedia
SQLAlchemy + pyramid_tm example. Note: Try to avoid global DBSession object and use request.dbsession
instead.
Race condition incidence examples
ConflictResolver, Alternative, more low level, optimistic locking solution for SQLAlchemy, based on using Python functions as retryable context instead of full HTTP request
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With