loading models in FastAPI projects at startup

Question

so I'm currently working on a FastAPI project that serves multiple NLP services. To do so I want to provide different models from spacy as well as huggingface.

Since those models are quite big the inference time when loading the models for each post request is quite long. My idea is to load all the models on FastAPI startup (in the app/main.py), however, I'm not sure, whether this is a good choice/idea or if there a some drawbacks to this approach since the models will be in the cache(?). (Info: I want to dockerize the project and deploy it on a virtual machine afterwards)

So far I wasn't able to find any guidance on the internet, so I hope to get a good answer here :)

Thanks in advance!

Yagiz Degirmenci · Accepted Answer

If you are deploying your app using gunicorn + uvicorn worker stack. You can use gunicorn's --preload flag.

From the documentation of gunicorn

preload_app

--preload Default: False

Load application code before the worker processes are forked.

By preloading an application you can save some RAM resources as well as speed up server boot times. Although, if you defer application loading to each worker process, you can reload your application code easily by restarting workers.

You just need to use --preload flag with your running options.

gunicorn --workers 2 --preload --worker-class=uvicorn.workers.UvicornWorker my_app:app

loading models in FastAPI projects at startup

Tags:

deployment

fastapi

Coczor

1 Answers

Yagiz Degirmenci

Recent Activity

Donate For Us

loading models in FastAPI projects at startup

Tags:

deployment

fastapi

Coczor

1 Answers

Yagiz Degirmenci

Related questions

Recent Activity

Donate For Us