Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

loading models in FastAPI projects at startup

so I'm currently working on a FastAPI project that serves multiple NLP services. To do so I want to provide different models from spacy as well as huggingface.

Since those models are quite big the inference time when loading the models for each post request is quite long. My idea is to load all the models on FastAPI startup (in the app/main.py), however, I'm not sure, whether this is a good choice/idea or if there a some drawbacks to this approach since the models will be in the cache(?). (Info: I want to dockerize the project and deploy it on a virtual machine afterwards)

So far I wasn't able to find any guidance on the internet, so I hope to get a good answer here :)

Thanks in advance!

like image 392
Coczor Avatar asked Apr 07 '26 12:04

Coczor


1 Answers

If you are deploying your app using gunicorn + uvicorn worker stack. You can use gunicorn's --preload flag.

From the documentation of gunicorn

preload_app

--preload Default: False

Load application code before the worker processes are forked.

By preloading an application you can save some RAM resources as well as speed up server boot times. Although, if you defer application loading to each worker process, you can reload your application code easily by restarting workers.

You just need to use --preload flag with your running options.

gunicorn --workers 2 --preload --worker-class=uvicorn.workers.UvicornWorker my_app:app
like image 144
Yagiz Degirmenci Avatar answered Apr 12 '26 09:04

Yagiz Degirmenci



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!