I have implemented a simple microservice using Flask, where the method that handles the request calculates a response based on the request data and a rather large datastructure loaded into memory. Now, when I deploy this application using gunicorn and a large number of worker threads, I would simply like to share the datastructure between the request handlers of all workers. Since the data is only read, there is no need for locking or similar. What is the best way to do this? Essentially what would be needed is this: <ul> <li>load/create the large data structure when the server is initialized</li> <li>somehow get a handle inside the request handling method to access the data structure</li> </ul> As far as I understand gunicorn allows me to implement various hook functions, e.g. for the time the server gets initialized, but a flask request handler method does not know anything about the gunicorn server data structure. I do not want to use something like redis or a database system for this, since all data is in a datastructure that needs to be loaded in memory and no deserialization must be involved. The calculation carried out for each request which uses the large data structure can be lengthy so it must happen concurrently in a truly independent thread or process for each request (this should scale up by running on a multi-core computer).

You can use preloading. This will allow you to create the data structure ahead of time, then fork each request handling process. This works because of copy-on-write and the knowledge that you are only reading from the large data structure. Note: Although this will work, it should probably only be used for very small apps or in a development environment. I think the more production-friendly way of doing this would be to queue up these calculations as tasks on the backend since they will be long-running. You can then notify users of the completed state. <hr> Here is a little snippet to see the difference of preloading. <pre class="prettyprint"><code># app.py import flask app = flask.Flask(__name__) def load_data(): print('calculating some stuff') return {'big': 'data'} @app.route('/') def index(): return repr(data) data = load_data() </code></pre> Running with <code>gunicorn app:app --workers 2</code>: <pre class="prettyprint"><code>[2017-02-24 09:01:01 -0500] [38392] [INFO] Starting gunicorn 19.6.0 [2017-02-24 09:01:01 -0500] [38392] [INFO] Listening at: http://127.0.0.1:8000 (38392) [2017-02-24 09:01:01 -0500] [38392] [INFO] Using worker: sync [2017-02-24 09:01:01 -0500] [38395] [INFO] Booting worker with pid: 38395 [2017-02-24 09:01:01 -0500] [38396] [INFO] Booting worker with pid: 38396 calculating some stuff calculating some stuff </code></pre> And running with <code>gunicorn app:app --workers 2 --preload</code>: <pre class="prettyprint"><code>calculating some stuff [2017-02-24 09:01:06 -0500] [38403] [INFO] Starting gunicorn 19.6.0 [2017-02-24 09:01:06 -0500] [38403] [INFO] Listening at: http://127.0.0.1:8000 (38403) [2017-02-24 09:01:06 -0500] [38403] [INFO] Using worker: sync [2017-02-24 09:01:06 -0500] [38406] [INFO] Booting worker with pid: 38406 [2017-02-24 09:01:06 -0500] [38407] [INFO] Booting worker with pid: 38407 </code></pre>

How to share in memory resources between Flask methods when deploying with Gunicorn

Tags:

python

flask

multiprocessing

gunicorn

I have implemented a simple microservice using Flask, where the method that handles the request calculates a response based on the request data and a rather large datastructure loaded into memory. Now, when I deploy this application using gunicorn and a large number of worker threads, I would simply like to share the datastructure between the request handlers of all workers. Since the data is only read, there is no need for locking or similar. What is the best way to do this?

Essentially what would be needed is this:

load/create the large data structure when the server is initialized
somehow get a handle inside the request handling method to access the data structure

As far as I understand gunicorn allows me to implement various hook functions, e.g. for the time the server gets initialized, but a flask request handler method does not know anything about the gunicorn server data structure.

I do not want to use something like redis or a database system for this, since all data is in a datastructure that needs to be loaded in memory and no deserialization must be involved.

The calculation carried out for each request which uses the large data structure can be lengthy so it must happen concurrently in a truly independent thread or process for each request (this should scale up by running on a multi-core computer).

964

asked Feb 24 '17 13:02

jpp1

1 Answers

You can use preloading.

This will allow you to create the data structure ahead of time, then fork each request handling process. This works because of copy-on-write and the knowledge that you are only reading from the large data structure.

Note: Although this will work, it should probably only be used for very small apps or in a development environment. I think the more production-friendly way of doing this would be to queue up these calculations as tasks on the backend since they will be long-running. You can then notify users of the completed state.

Here is a little snippet to see the difference of preloading.

# app.py

import flask

app = flask.Flask(__name__)

def load_data():
    print('calculating some stuff')
    return {'big': 'data'}

@app.route('/')
def index():
    return repr(data)

data = load_data()

Running with gunicorn app:app --workers 2:

[2017-02-24 09:01:01 -0500] [38392] [INFO] Starting gunicorn 19.6.0
[2017-02-24 09:01:01 -0500] [38392] [INFO] Listening at: http://127.0.0.1:8000 (38392)
[2017-02-24 09:01:01 -0500] [38392] [INFO] Using worker: sync
[2017-02-24 09:01:01 -0500] [38395] [INFO] Booting worker with pid: 38395
[2017-02-24 09:01:01 -0500] [38396] [INFO] Booting worker with pid: 38396
calculating some stuff
calculating some stuff

And running with gunicorn app:app --workers 2 --preload:

calculating some stuff
[2017-02-24 09:01:06 -0500] [38403] [INFO] Starting gunicorn 19.6.0
[2017-02-24 09:01:06 -0500] [38403] [INFO] Listening at: http://127.0.0.1:8000 (38403)
[2017-02-24 09:01:06 -0500] [38403] [INFO] Using worker: sync
[2017-02-24 09:01:06 -0500] [38406] [INFO] Booting worker with pid: 38406
[2017-02-24 09:01:06 -0500] [38407] [INFO] Booting worker with pid: 38407

answered Sep 21 '22 15:09

Jared

Related questions
                            
                                How to append a file to a tar file use python tarfile module?
                            
                                What does the"wait_window" method do?
                            
                                What's the difference between tkinter's Tk and Toplevel classes?
                            
                                Python: one single module (file .py) for each class? [closed]
                            
                                What is the difference between Python's __add__ and __concat__?
                            
                                sklearn classifier get ValueError: bad input shape
                            
                                Set size of matplotlib figure with 3d subplots
                            
                                Why do people default owner parameter to None in __get__?
                            
                                Pandas DataFrame - Combining one column's values with same index into list
                            
                                Saving a cross-validation trained model in Scikit
                            
                                python requests upload large file with additional data
                            
                                Jupyter notebook does not print logs to the output cell
                            
                                How int() object uses "==" operator without __eq__() method in python2?
                            
                                What is the default variable initializer in Tensorflow?
                            
                                Cannot convert string to float in pandas (ValueError)
                            
                                How to document multiple return values using reStructuredText in Python 2?
                            
                                How am I supposed to register a package to PyPI?
                            
                                value error in python statsmodels.tsa.seasonal
                            
                                create a new dataframe from selecting specific rows from existing dataframe python
                            
                                Why Python hasn't true constants? Is it not dangerous?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With