Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sharing large object between different processes in Python 3.4

I am trying to share a large object (~2 GB) between different processes in Python, in order to cut down on memory usage. I have learned about the Manager class and proxies in the multiprocessing library (https://docs.python.org/3.4/library/multiprocessing.html#multiprocessing-managers). However, according to the documentation and to other Stackoverflow users, this can be very slow when it is used on large objects like this one. Is this correct, and if so, is there another faster Python library or function that I can use instead? Thanks.

EDIT: The object I created is a DAG (directed acyclic graph) whose constructor consists of standard python values, though.

like image 279
Alexander Whatley Avatar asked Aug 22 '15 02:08

Alexander Whatley


2 Answers

If your data is limited to standard values and arrays (no other Python objects) you can use Shared Memory (Value() and Array(), see https://docs.python.org/3.4/library/multiprocessing.html#shared-ctypes-objects). It is very fast.

like image 59
Dragan Nikolic Avatar answered Sep 18 '22 03:09

Dragan Nikolic


One solution to the problem is to make the graph a processus which exposes methods that are executed from other processus using proxies. This means that you have to build a similar class to manager.dict and manager.value. This is done via a producer/consumer pattern. It's called Inter Process Call (IPC) or Remote Procedure Call (RPC). Solutions might involve zeroless or pyro.

Another solutions, that is simpler

Another solution is to use a database. For instance, bsddb or lmdb which support at least multiprocessus read access to the database. Using ajgu or the simpler design. Can save you from writing a lot of code.

Last solution, is to build a file that you mmap in memory and read from their. But this is really a solution with your graph is readonly because if you expect to modify the graph you will need to start to write an mmap'ed graph database. This has the advantage of being fully in memory.

My recommendation is to use lmdb to build a graph database taking exemple from the simpler version of ajgu with two script:

  • One to create the database
  • Another class that will use the graph from different process.
like image 43
amirouche Avatar answered Sep 21 '22 03:09

amirouche