Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyramid REST API: How do I handle concurrent data access safely?

I am working on a REST API for a web service using Pyramid and Cornice; data on the server side is handled using SQLAlchemy and MySQL. The web server is nginx using uwsgi, and it's configured to run multiple Python processes:

[uwsgi]
socket = localhost:6542
plugins = python34
...
processes = 2 # spawn the specified number of workers/processes
threads = 2 # run each worker in prethreaded mode with the specified number of threads

Problem

Suppose a table customers on the server side. Using the API one can read customer data, modify it, or delete it. In addition to that there are other API functions which read customer data.

I could issue multiple API calls at the same time which then compete for the same customer resource:

# Write/modify the customer {id} data
curl --request POST ... https://some.host/api/customer/{id}
# Delete customer {id} and all of its associated data
curl --request DELETE https://some.host/api/customer/{id}
# Perform some function which reads customer {id}
curl --request GET ... https://some.host/api/do-work

Essentially this is a Readers-Writers Problem, but because more than one process is involved, traditional thread synchronization using locks/mutexes/semaphores won't work here.

Question

I'd like to understand the best way to implement locking and synchronization for such a Pyramid based web API, such that concurrent calls like in the above example are handled safely and efficiently (i.e. without unnecessary serializing).

Solutions (?)

  • I don't think it makes sense to mark/flag customer {id} as locked because SQLAlchemy caches such modifications, and flush() doesn't seem atomic enough in this context?
  • This article describes using the HTTP ETag to manage shared resources.
  • One could also use Redis as a distributed lock manager for a spinlock to wrap a view function?
  • What about Pyramid's transaction manager?
like image 929
Jens Avatar asked Nov 26 '15 09:11

Jens


1 Answers

I assume you are dealing with one MySQL database and your locks do not need to cover other resources (Redis, third party APIs, etc.). I also assume your client side functions do not itself need to work on transactions data (maintain a session over several API calls), you just want to prevent concurrent API access to messing up your database.

There are two kinds of locking, pessimistic locking and optimistic locking.

Pessimistic locking is what most people usually know by locking - you create and acquire locks beforehand, programmatically in code. This is what distributed lock manager is.

Optimistic locking is what you can quite easily get away with SQL databases. If two transactions compete from the same resource, database effectively dooms one of the transactions and the application framework (in this case Pyramid + pyramid_tm) can retry the transaction N times before giving up.

The optimistic locking is more ideal solution from the development point of view, as it does not put any cognitive load on the application developer to remember to lock resources correctly or create in-house locking mechanisms. Instead, the developer relies on framework and database to retry and manage concurrency situations. However, optimistic locking is not that well known among web developers, because doing optimistic locking in widespread PHP'esque environments is difficult due to lack of flexibility in the programming language.

pyramid_tm implements optimistic locking solution and I would recommend you to use it or some other optimistic locking solution, unless you know a very specific reason you don't want to.

  • pyramid_tm ties transaction life cycle to HTTP request, very natural from the web developer point of view

  • pyramid_tm can tie other events to successful transactions, e.g. pyramid_mailer sends out email to users only if the transactions commit

  • pyramid_tm is well tested and based on ZODB transaction transaction manager, which has been in production usage since early 2000

  • Make sure your SQLAlchemy session is set to SERIALIZABLE SQL isolation level - you start with the highest consistency model. You can lower this requirement for performance if you know API calls tolerate it - e.g. calls doing statistics read only analysis.

  • Optimistic locking usually performs better in "normal" lots of reads - few writes workloads where it is rare that a conflict raises (two API calls update the same user once). The transaction retry penalty hits only if there is a conflict.

  • If the transaction ultimately fail after N retries, e.g. under unusual high load situation, this should be resolved on the API consumer side telling that the server side data has changed and the user must verify or refill the form again

Further reading

  • Optimistic concurrency control in Wikipedia

  • SQLAlchemy + pyramid_tm example. Note: Try to avoid global DBSession object and use request.dbsession instead.

  • Race condition incidence examples

  • ConflictResolver, Alternative, more low level, optimistic locking solution for SQLAlchemy, based on using Python functions as retryable context instead of full HTTP request

like image 56
Mikko Ohtamaa Avatar answered Sep 28 '22 06:09

Mikko Ohtamaa