Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store easily python usable read-only data structures in shared memory

I have a python process serving as a WSGI-apache server. I have many copies of this process running on each of several machines. About 200 megabytes of my process is read-only python data. I would like to place these data in a memory-mapped segment so that the processes could share a single copy of those data. Best would be to be able to attach to those data so they could be actual python 2.7 data objects rather than parsing them out of something like pickle or DBM or SQLite.

Does anyone have sample code or pointers to a project that has done this to share?

like image 493
Jim Avatar asked Feb 17 '15 20:02

Jim


3 Answers

This post by @modelnine on StackOverflow provides a really great comprehensive answer to this question. As he mentioned, using threads rather than process-forking in your webserver can significantly lesson the impact of this. I ran into a similar problem trying to share extremely-large NumPy arrays between CLI Python processes using some type of shared memory a couple of years ago, and we ended up using a combination of a sharedmem Python extension to share data between the workers (which proved to leak memory in certain cases, but, it's fixable probably). A read-only mmap() technique might work for you, but I'm not sure how to do that in pure-python (NumPy has a memmapping technique explained here). I've never found any clear and simple answers to this question, but hopefully this can point you in some new directions. Let us know what you end up doing!

like image 168
Will Avatar answered Oct 19 '22 01:10

Will


It's difficult to share actual python objects because they are bound to the process address space. However, if you use mmap, you can create very usable shared objects. I'd create one process to pre-load the data, and the rest could use it. I found quite a good blog post that describes how it can be done: http://blog.schmichael.com/2011/05/15/sharing-python-data-between-processes-using-mmap/

like image 2
Gary Wisniewski Avatar answered Oct 19 '22 01:10

Gary Wisniewski


Since it's read-only data you won't need to share any updates between processes (since there won't be any updates) I propose you just keep a local copy of it in each process.

If memory constraints is an issue you can have a look at using multiprocessing.Value or multiprocessing.Array without locks for this: https://docs.python.org/2/library/multiprocessing.html#shared-ctypes-objects

Other than that you'll have to rely on an external process and some serialising to get this done, I'd have a look at Redis or Memcached if I were you.

like image 1
Raniz Avatar answered Oct 19 '22 02:10

Raniz