Suppose that I've written a wsgi application
. I run this application on Apache2
on Linux
with multi-threaded mod-wsgi
configuration, so that my application is run in many threads per single process:
WSGIDaemonProcess mysite processes=3 threads=2 display-name=mod_wsgi
WSGIProcessGroup mysite
WSGIScriptAlias / /some/path/wsgi.py
The application code is:
def application(environ, start_response):
from foo import racer
status = '200 OK'
response_headers = [('Content-type', 'text/plain')]
start_response(status, response_headers)
return [racer()] #call to racer creates a race condition?
module foo.py:
a = 1
def racer():
global a
a = a + 1
return str(a)
Did I just create a race condition with variable a
? I guess, a
is a module-level variable, that exists in foo.py
and is the same (shared) among threads?
More theoretical questions derived from this:
a
variable so my example is not thread-safe?Apache
, each thread of my application on Linux is created on C-level with pthreads
API and the function, which the pthread
must execute is some kind of python interpreter's main function? Or does Apache protect me somehow from this error?Tornado
's HTTPServer
? Web server, written in python, implements threads as python-level threading.Thread
objects, and runs application
function in each thread. So, I suppose it's a race condition? (I also suppose, in this case I can abstract from underlying C-level pthreads
below threading.Thread
implementation and worry only about python functions, because the interpreter won't allow me to modify C-level shared data and screw its functioning. So the only way to break thread-safety for me is to deal with global variables? Is that right?)wsgi. multithread is always true, regardless of whether Apache is using multiple threads or not. wsgi. multiprocess sometimes produces unexpected values.
The Web Server Gateway Interface (WSGI, pronounced whiskey or WIZ-ghee) is a simple calling convention for web servers to forward requests to web applications or frameworks written in the Python programming language. The current version of WSGI, version 1.0. 1, is specified in Python Enhancement Proposal (PEP) 3333.
WSGI stands for "Web Server Gateway Interface". It is used to forward requests from a web server (such as Apache or NGINX) to a backend Python web application or framework. From there, responses are then passed back to the webserver to reply to the requestor.
The Web Server Gateway Interface (WSGI) is a standard interface between web server software and web applications written in Python.
Yes, you have a race condition there, but it's not related to the imports. The global state in foo.a
is subject to a data race between a + 1
and a = ...
; since two threads can see the same value for a
, and thus compute the same successor.
The import machinery itself does protect against duplicate imports by multiple threads, by means of a process wide lock (see imp.lock_held()
). Although this could, in theory, lead to a deadlock, this almost never happens, because few python modules lock other resources at import time.
This also suggests that it's probably safe to modify sys.path
at will; since this usually happens only at import time (for the purpose of additional imports), and so that thread is already holds the import lock, other threads cannot cause imports that would also modify that state.
Fixing the race in racer()
is quite easy, though:
import threading
a = 1
a_lock = threading.Lock()
def racer():
global a
with a_lock:
my_a = a = a + 1
return str(my_a)
which will be needed for any global, mutable state in your control.
Read the mod_wsgi documentation about the various processes/thread configurations and in particular what it says about data sharing.
In particular it says:
Where global data in a module local to a child process is still used, for example as a cache, access to and modification of the global data must be protected by local thread locking mechanisms.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With