Python, WSGI, multiprocessing and shared data

Tags:

I am a bit confused about multiproessing feature of mod_wsgi and about a general design of WSGI applications that would be executed on WSGI servers with multiprocessing ability.

Consider the following directive:

Click to copy

WSGIDaemonProcess example processes=5 threads=1

If I understand correctly, mod_wsgi will spawn 5 Python (e.g. CPython) processes and any of these processes can receive a request from a user.

The documentation says that:

Where shared data needs to be visible to all application instances, regardless of which child process they execute in, and changes made to the data by one application are immediately available to another, including any executing in another child process, an external data store such as a database or shared memory must be used. Global variables in normal Python modules cannot be used for this purpose.

But in that case it gets really heavy when one wants to be sure that an app runs in any WSGI conditions (including multiprocessing ones).

For example, a simple variable which contains the current amount of connected users - should it be process-safe read/written from/to memcached, or a DB or (if such out-of-the-standard-library mechanisms are available) shared memory?

And will the code like

Click to copy

counter = 0  @app.route('/login') def login():     ...     counter += 1     ...  @app.route('/logout') def logout():     ...     counter -= 1     ...  @app.route('/show_users_count') def show_users_count():     return counter

behave unpredictably in multiprocessing environment?

Thank you!

557

asked Oct 03 '12 19:10

Zaur Nasibov

1 Answers

There are several aspects to consider in your question.

First, the interaction between apache MPM's and mod_wsgi applications. If you run the mod_wsgi application in embedded mode (no WSGIDaemonProcess needed, WSGIProcessGroup %{GLOBAL}) you inherit multiprocessing/multithreading from the apache MPM's. This should be the fastest option, and you end up having multiple processes and multiple threads per process, depending on your MPM configuration. On the contrary if you run mod_wsgi in daemon mode, with WSGIDaemonProcess <name> [options] and WSGIProcessGroup <name>, you have fine control on multiprocessing/multithreading at the cost of a small overhead.

Within a single apache2 server you may define zero, one, or more named WSGIDaemonProcesses, and each application can be run in one of these processes (WSGIProcessGroup <name>) or run in embedded mode with WSGIProcessGroup %{GLOBAL}.

You can check multiprocessing/multithreading by inspecting the wsgi.multithread and wsgi.multiprocess variables.

With your configuration WSGIDaemonProcess example processes=5 threads=1 you have 5 independent processes, each with a single thread of execution: no global data, no shared memory, since you are not in control of spawning subprocesses, but mod_wsgi is doing it for you. To share a global state you already listed some possible options: a DB to which your processes interface, some sort of file system based persistence, a daemon process (started outside apache) and socket based IPC.

As pointed out by Roland Smith, the latter could be implemented using a high level API by multiprocessing.managers: outside apache you create and start a BaseManager server process

Click to copy

m = multiprocessing.managers.BaseManager(address=('', 12345), authkey='secret') m.get_server().serve_forever()

and inside you apps you connect:

Click to copy

m = multiprocessing.managers.BaseManager(address=('', 12345), authkey='secret') m.connect()

The example above is dummy, since m has no useful method registered, but here (python docs) you will find how to create and proxy an object (like the counter in your example) among your processes.

A final comment on your example, with processes=5 threads=1. I understand that this is just an example, but in real world applications I suspect that performance will be comparable with respect to processes=1 threads=5: you should go into the intricacies of sharing data in multiprocessing only if the expected performance boost over the 'single process many threads' model is significant.

133

answered Oct 06 '22 07:10

Stefano M

Related questions
                            
                                Is Django admin difficult to customize?
                            
                                Non-blocking ORM for Tornado?
                            
                                Unknown format code 'f' for object of type 'unicode'
                            
                                What does self = None do?
                            
                                asyncio: Is it possible to cancel a future been run by an Executor?
                            
                                Converting from Pandas dataframe to TensorFlow tensor object
                            
                                Get the format in dateutil.parse
                            
                                How to mock requests using pytest? [duplicate]
                            
                                Accessing a dict by variable in Django templates?
                            
                                Add an object to a python list
                            
                                Driving a Windows GUI program from a script
                            
                                EOFError: EOF when reading a line
                            
                                Django REST Framework: raise error when extra fields are present on POST
                            
                                Does spark predicate pushdown work with JDBC?
                            
                                Load pickled object in different file - Attribute error
                            
                                How to run python interactive in current file's directory in Visual Studio Code?
                            
                                Parallelism in Python
                            
                                Converting a Mercurial (hg) repository to Git on Windows (7)
                            
                                When is a python object's hash computed and why is the hash of -1 different?
                            
                                Descriptors as instance attributes in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python, WSGI, multiprocessing and shared data

Tags:

python

multiprocessing

mod-wsgi

wsgi

Zaur Nasibov

People also ask

1 Answers

Stefano M

Recent Activity

Donate For Us