I am having a hard time trying to figure out the big picture of the handling of multiple requests by the <code>uwsgi</code> server with <code>django</code> or <code>pyramid</code> application. My understanding at the moment is this: When multiple http requests are sent to uwsgi server concurrently, the server creates a separate processes or threads (copies of itself) for every request (or assigns to them the request) and every process/thread loads the webapplication's code (say django or pyramid) into computers memory and executes it and returns the response. In between every copy of the code can access the session, cache or database. There is a separate database server usually and it can also handle concurrent requests to the database. So here some questions I am fighting with. <ol> <li>Is my above understanding correct or not? </li> <li>Are the copies of code interact with each other somehow or are they wholly separated from each other?</li> <li>What about the session or cache? Are they shared between them or are they local to each copy? </li> <li>How are they created: by the webserver or by copies of python code? </li> <li>How are responses returned to the requesters: by each process concurrently or are they put to some kind of queue and sent synchroniously?</li> </ol> I have googled these questions and have found very interesting answers on StackOverflow but anyway can't get the whole picture and the whole process remains a mystery for me. It would be fantastic if someone can explain the whole picture in terms of django or pyramid with uwsgi or whatever webserver. Sorry for asking kind of dumb questions, but they really torment me every night and I am looking forward to your help:)

There's no magic in pyramid or django that gets you past process boundaries. The answers depend entirely on the particular server you've selected and the settings you've selected. For example, uwsgi has the ability to run multiple threads and multiple processes. If uwsig spins up multiple processes then they will each have their own copies of data which are not shared unless you took the time to create some IPC (this is why you should keep state in a third party like a database instead of in-memory objects which are not shared across processes). Each process initializes a WSGI object (let's call it <code>app</code>) which the server calls via <code>body_iter = app(environ, start_response)</code>. This <code>app</code> object is shared across all of the threads in the process and is invoked concurrently, thus it needs to be threadsafe (usually the structures the <code>app</code> uses are either threadlocal or readonly to deal with this, for example a connection pool to the database). In general the answers to your questions are that things happen concurrently, and objects may or may not be shared based on your server model but in general you should take anything that you want to be shared and store it somewhere that can handle concurrency properly (a database).

What exactly happens on the computer when multiple requests came to the webserver serving django or pyramid application?

Tags:

python

multithreading

webserver

django

pyramid

I am having a hard time trying to figure out the big picture of the handling of multiple requests by the uwsgi server with django or pyramid application.

My understanding at the moment is this: When multiple http requests are sent to uwsgi server concurrently, the server creates a separate processes or threads (copies of itself) for every request (or assigns to them the request) and every process/thread loads the webapplication's code (say django or pyramid) into computers memory and executes it and returns the response. In between every copy of the code can access the session, cache or database. There is a separate database server usually and it can also handle concurrent requests to the database.

So here some questions I am fighting with.

Is my above understanding correct or not?
Are the copies of code interact with each other somehow or are they wholly separated from each other?
What about the session or cache? Are they shared between them or are they local to each copy?
How are they created: by the webserver or by copies of python code?
How are responses returned to the requesters: by each process concurrently or are they put to some kind of queue and sent synchroniously?

I have googled these questions and have found very interesting answers on StackOverflow but anyway can't get the whole picture and the whole process remains a mystery for me. It would be fantastic if someone can explain the whole picture in terms of django or pyramid with uwsgi or whatever webserver.

Sorry for asking kind of dumb questions, but they really torment me every night and I am looking forward to your help:)

766

asked Aug 04 '16 12:08

sehrob

2 Answers

There's no magic in pyramid or django that gets you past process boundaries. The answers depend entirely on the particular server you've selected and the settings you've selected. For example, uwsgi has the ability to run multiple threads and multiple processes. If uwsig spins up multiple processes then they will each have their own copies of data which are not shared unless you took the time to create some IPC (this is why you should keep state in a third party like a database instead of in-memory objects which are not shared across processes). Each process initializes a WSGI object (let's call it app) which the server calls via body_iter = app(environ, start_response). This app object is shared across all of the threads in the process and is invoked concurrently, thus it needs to be threadsafe (usually the structures the app uses are either threadlocal or readonly to deal with this, for example a connection pool to the database).

In general the answers to your questions are that things happen concurrently, and objects may or may not be shared based on your server model but in general you should take anything that you want to be shared and store it somewhere that can handle concurrency properly (a database).

144

answered Oct 12 '22 23:10

Michael Merickel

The power and weakness of webservers is that they are in principle stateless. This enables them to be massively parallel. So indeed for each page request a different thread may be spawned. Wether or not this indeed happens depends on the configuration settings of you webserver. There's also a cost to spawning many threads, so if possible threads are reused from a thread pool.

Almost all serious webservers have page cache. So if the same page is requested multiple times, it can be retrieved from cache. In addition, browsers do their own caching. A webserver has to be clever about what to cache and what not. Static pages aren't a big problem, although they may be replaced, in which case it is quite confusing to still get the old page served because of the cache.

One way to defeat the cache is by adding (dummy) parameters to the page request.

The statelessness of the web was initialy welcomed as a necessity to achieve scalability, where webpages of busy sites even could be served concurrently from different servers at nearby or remote locations.

However the trend is to have stateful apps. State can be maintained on the browser, easing the burden on the server. If it's maintained on the server it requires the server to know 'who's talking'. One way to do this is saving and recognizing cookies (small identifiable bits of data) on the client.

For databases the story is a bit different. As soon as anything gets stored that relates to a particular user, the application is in principle stateful. While there's no conceptual difference between retaining state on disk and in RAM memory, traditionally statefulness was left to the database, which in turned used thread pools and load balancing to do its job efficiently.

With the advent of very large internet shops like amazon and google, mandatory disk access to achieve statefulness created a performance problem. The answer were in-memory databases. While they may be accessed traditionally using e.g. SQL, they offer much more flexibility in the way data is stored conceptually.

A type of database that enjoys growing popularity is persistent object store. With this database, while the distinction still can be made formally, the boundary between webserver and database is blurred. Both have their data in RAM (but can swap to disk if needed), both work with objects rather than flat records as in SQL tables. These objects can be interconnected in complex ways.

In short there's an explosion of innovative storage / thread pooling / caching/ persistence / redundance / synchronisation technology, driving what has become popularly know as 'the cloud'.

answered Oct 12 '22 23:10

Jacques de Hooge

Related questions
                            
                                What's distutils' equivalent of setuptools' `find_packages`? (python)
                            
                                How to unittest Python Lock is acquired with 'with' statement?
                            
                                value based thread lock
                            
                                What's the most efficient way to select a non-rectangular ROI of an Image in OpenCV?
                            
                                Unsupported TIFF Compression
                            
                                Is it actually possible to pass data (callback) from mpld3 to ipython?
                            
                                How to compute optical flow using tvl1 opencv function
                            
                                How to use monkeypatch in a "setup" method for unit tests using pytest?
                            
                                Parse BeautifulSoup element into Selenium
                            
                                Reading large file in Spark issue - python
                            
                                catch exception and return empty dataframe
                            
                                Dividing Pandas Dataframe by Week
                            
                                How to drop rows in an H2OFrame?
                            
                                Handle invalid arguments with argparse in Python
                            
                                multiprocessing module and distinct psycopg2 connections
                            
                                Angular-cli with any other server
                            
                                Tensorflow: why is zip() function used in the steps involving applying the gradients?
                            
                                Finding new position (x,y) after resizing image
                            
                                Customize Keras' loss function in a way that the y_true will depend on y_pred
                            
                                Howto copy a dask dataframe?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With