Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caching large objects in a python Flask/Gevent web service

I am building a python based web service that provides natural language processing support to our main app API. Since it's so NLP heavy, it requires unpickling a few very large (50-300MB) corpus files from the disk before it can do any kind of analyses.

How can I load these files into memory so that they are available to every request? I experimented with memcached and redis but they seem designed for much smaller objects. I have also been trying to use the Flask g object, but this only persists throughout one request.

Is there any way to do this while using a gevent (or other) server to allow concurrent connections? The corpora are completely read-only so there ought to be a safe way to expose the memory to multiple greenlets/threads/processes.

Thanks so much and sorry if it's a stupid question - I've been working with python for quite a while but I'm relatively new to web programming.

like image 533
sbrother Avatar asked May 11 '26 03:05

sbrother


1 Answers

If you are using Gevent you can have your read-only data structures in the global scope of your process and they will be shared by all the greenlets. With Gevent your server will be contained in a single process, so the data can be loaded once and shared among all the worker greenlets.

A good way to encapsulate access to the data is by putting access function(s) or class(es) in a module. You can do the unpicliking of the data when the module is imported, or you can trigger this task the first time someone calls a function into the module.

You will need to make sure there is no possibility of introducing a race condition, but if the data is strictly read-only you should be fine.

like image 90
Miguel Avatar answered May 12 '26 15:05

Miguel