I have a large model file that I use in my webservice built in Flask and then served through Gunicorn. The folder is strucure is like this: <pre class="prettyprint"><code>A.py Folder_1\ __init__.py B.py </code></pre> The model is loaded in <code>__init__.py</code> and used in <code>B.py</code> The entry point is <code>A.py</code> that contains <code>@app.routes, etc.</code> I start <code>A.py</code> with gunicorn and preload the app using <code>--preload</code> option and there are 8 workers. I am facing 100% CPU utilization on 8 cores; apparently the requests are stuck at app server and not being forwarded to DB. Does the model is also preloaded and made available to all 8 workers, i.e. Is it shared between the worker processes? If not do I have to load the model in <code>A.py</code> so that the model is also preloaded for all workers. I think the model is being loaded by every worker process and since the model is large, the workers are stuck there. EDIT 1 : Since I was notified that this might be a duplicate question, I want to clarify I am not asking how python handles shared object. I understand that is possible by using <code>multiprocessing</code>. In my case, I start flask server from gunicorn with 8 workers using --preload option, there are 8 instances of my app running. My question is, since the code was preloaded before workers were forked, gunicorn workers will share the same model object, or they will have a separate copy each.?

<blockquote> My question is, since the code was preloaded before workers were forked, gunicorn workers will share the same model object, or they will have a separate copy each.? </blockquote> It will be a separate copy. preloading simply takes advantage of the fact that when you call the operating system's <code>fork()</code> call to create a new process, the OS is able to share unmodified sections of memory between the two processes. By preloading as much code as possible more memory is shared between the processes. This is simply a behind-the-scenes operating system optimization: from the perspective of each individual python process they have unique copies of each object.

Memory Sharing among workers in gunicorn using --preload

Tags:

python

flask

gunicorn

I have a large model file that I use in my webservice built in Flask and then served through Gunicorn. The folder is strucure is like this:

A.py
Folder_1\
    __init__.py
    B.py

The model is loaded in __init__.py and used in B.py The entry point is A.py that contains @app.routes, etc.

I start A.py with gunicorn and preload the app using --preload option and there are 8 workers.

I am facing 100% CPU utilization on 8 cores; apparently the requests are stuck at app server and not being forwarded to DB. Does the model is also preloaded and made available to all 8 workers, i.e. Is it shared between the worker processes? If not do I have to load the model in A.py so that the model is also preloaded for all workers.

I think the model is being loaded by every worker process and since the model is large, the workers are stuck there.

EDIT 1 : Since I was notified that this might be a duplicate question, I want to clarify I am not asking how python handles shared object. I understand that is possible by using multiprocessing. In my case, I start flask server from gunicorn with 8 workers using --preload option, there are 8 instances of my app running. My question is, since the code was preloaded before workers were forked, gunicorn workers will share the same model object, or they will have a separate copy each.?

553

asked Jul 13 '17 05:07

greenlantern

1 Answers

My question is, since the code was preloaded before workers were forked, gunicorn workers will share the same model object, or they will have a separate copy each.?

It will be a separate copy.

preloading simply takes advantage of the fact that when you call the operating system's fork() call to create a new process, the OS is able to share unmodified sections of memory between the two processes. By preloading as much code as possible more memory is shared between the processes.

This is simply a behind-the-scenes operating system optimization: from the perspective of each individual python process they have unique copies of each object.

170

answered Nov 14 '22 22:11

Levi

Related questions
                            
                                grequests with requests has collision
                            
                                Modifying timestamps in pandas to make index unique
                            
                                Django how to get multiple context_object_name for multiple queryset from single view to single template
                            
                                Python how to get value from argparse from variable, but not the name of the variable?
                            
                                Create a matrix from a vector where each row is a shifted version of the vector
                            
                                Deploying asgi and wsgi on Heroku
                            
                                How to play mp3 from bytes?
                            
                                cbind (R function) equivalent in numpy
                            
                                How to import and call a Python function in a Jinja template? [closed]
                            
                                Get keys of pandas.Series.value_counts
                            
                                How can I display the test name *after* the test using pytest?
                            
                                Convert array into percentiles
                            
                                why is that people use sqlalchemy CORE to save data and use sqlalchemy ORM to query data
                            
                                what is the difference between scipy.stats module and numpy.random module, between similar methods that both modules have?
                            
                                How to get list of values in ImageDataGenerator.flow_from_directory Keras?
                            
                                Unresolved reference when calling a global variable?
                            
                                Use scrapy to get list of urls, and then scrape content inside those urls
                            
                                Convert PyQt5 QPixmap to numpy ndarray
                            
                                Best Algorithm to make correction typos in text
                            
                                Expanding/Zooming in a numpy array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With