Caching large objects in a python Flask/Gevent web service

Question

I am building a python based web service that provides natural language processing support to our main app API. Since it's so NLP heavy, it requires unpickling a few very large (50-300MB) corpus files from the disk before it can do any kind of analyses.

How can I load these files into memory so that they are available to every request? I experimented with memcached and redis but they seem designed for much smaller objects. I have also been trying to use the Flask g object, but this only persists throughout one request.

Is there any way to do this while using a gevent (or other) server to allow concurrent connections? The corpora are completely read-only so there ought to be a safe way to expose the memory to multiple greenlets/threads/processes.

Thanks so much and sorry if it's a stupid question - I've been working with python for quite a while but I'm relatively new to web programming.

Miguel · Accepted Answer

If you are using Gevent you can have your read-only data structures in the global scope of your process and they will be shared by all the greenlets. With Gevent your server will be contained in a single process, so the data can be loaded once and shared among all the worker greenlets.

A good way to encapsulate access to the data is by putting access function(s) or class(es) in a module. You can do the unpicliking of the data when the module is imported, or you can trigger this task the first time someone calls a function into the module.

You will need to make sure there is no possibility of introducing a race condition, but if the data is strictly read-only you should be fine.

Caching large objects in a python Flask/Gevent web service

Tags:

python

caching

flask

nlp

gevent

sbrother

1 Answers

Miguel

Recent Activity

Donate For Us

Caching large objects in a python Flask/Gevent web service

Tags:

python

caching

flask

nlp

gevent

sbrother

1 Answers

Miguel

Related questions

Recent Activity

Donate For Us