Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python + wsgi on a multi-threaded web-server: is this a race condition?

Suppose that I've written a wsgi application. I run this application on Apache2 on Linux with multi-threaded mod-wsgi configuration, so that my application is run in many threads per single process:

WSGIDaemonProcess mysite processes=3 threads=2 display-name=mod_wsgi
WSGIProcessGroup mysite
WSGIScriptAlias / /some/path/wsgi.py

The application code is:

def application(environ, start_response):
    from foo import racer
    status = '200 OK'
    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)
    return [racer()] #call to racer creates a race condition?

module foo.py:

a = 1
def racer():
    global a
    a = a + 1
    return str(a)

Did I just create a race condition with variable a? I guess, a is a module-level variable, that exists in foo.py and is the same (shared) among threads?

More theoretical questions derived from this:

  1. Concurrent threads within the same process access and modify the same a variable so my example is not thread-safe?
  2. If my web-server is Apache, each thread of my application on Linux is created on C-level with pthreads API and the function, which the pthread must execute is some kind of python interpreter's main function? Or does Apache protect me somehow from this error?
  3. What if I were running this on a python-written web-server like Tornado's HTTPServer? Web server, written in python, implements threads as python-level threading.Thread objects, and runs application function in each thread. So, I suppose it's a race condition? (I also suppose, in this case I can abstract from underlying C-level pthreads below threading.Thread implementation and worry only about python functions, because the interpreter won't allow me to modify C-level shared data and screw its functioning. So the only way to break thread-safety for me is to deal with global variables? Is that right?)
like image 962
Boris Burkov Avatar asked May 15 '14 18:05

Boris Burkov


People also ask

Is WSGI multithreaded?

wsgi. multithread is always true, regardless of whether Apache is using multiple threads or not. wsgi. multiprocess sometimes produces unexpected values.

Is WSGI a web server?

The Web Server Gateway Interface (WSGI, pronounced whiskey or WIZ-ghee) is a simple calling convention for web servers to forward requests to web applications or frameworks written in the Python programming language. The current version of WSGI, version 1.0. 1, is specified in Python Enhancement Proposal (PEP) 3333.

How does Python WSGI work?

WSGI stands for "Web Server Gateway Interface". It is used to forward requests from a web server (such as Apache or NGINX) to a backend Python web application or framework. From there, responses are then passed back to the webserver to reply to the requestor.

Is WSGI Python only?

The Web Server Gateway Interface (WSGI) is a standard interface between web server software and web applications written in Python.


2 Answers

Yes, you have a race condition there, but it's not related to the imports. The global state in foo.a is subject to a data race between a + 1 and a = ...; since two threads can see the same value for a, and thus compute the same successor.

The import machinery itself does protect against duplicate imports by multiple threads, by means of a process wide lock (see imp.lock_held()). Although this could, in theory, lead to a deadlock, this almost never happens, because few python modules lock other resources at import time.

This also suggests that it's probably safe to modify sys.path at will; since this usually happens only at import time (for the purpose of additional imports), and so that thread is already holds the import lock, other threads cannot cause imports that would also modify that state.

Fixing the race in racer() is quite easy, though:

import threading
a = 1
a_lock = threading.Lock()

def racer():
    global a
    with a_lock:
        my_a = a = a + 1
    return str(my_a)

which will be needed for any global, mutable state in your control.

like image 120
SingleNegationElimination Avatar answered Oct 29 '22 10:10

SingleNegationElimination


Read the mod_wsgi documentation about the various processes/thread configurations and in particular what it says about data sharing.

  • http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading#Building_A_Portable_Application

In particular it says:

Where global data in a module local to a child process is still used, for example as a cache, access to and modification of the global data must be protected by local thread locking mechanisms.

like image 2
Graham Dumpleton Avatar answered Oct 29 '22 09:10

Graham Dumpleton