I have 4 machine learning models
of size 2GB
each, i.e. 8GB
in total. I am getting requests around 100 requests
at a time. Each request is taking around 1sec
.
I have a machine having 15GB RAM
. Now if I increase the number of workers
in Gunicorn, total memory consumption go high. So I can't increase the number of workers beyond 2.
So I have few questions regarding it :
share models or memory
between them? sync or async
considering mentioned situation?preload
option in Gunicorn
if it is a solution? I used it but it is of no help. May be I am doing it in a wrong way.Here is the Flask code which I am using
https://github.com/rathee/learnNshare/blob/master/agent_api.py
Use the gevent worker (or another event loop worker), not the default worker. The default sync worker handles one request per worker process. An async worker handles an unlimited number of requests per worker process as long as each request is non-blocking.
gunicorn -k gevent myapp:app
Predictably, you need to install gevent for this: pip install gevent
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With