Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

share python object between multiprocess in python3

Here I create a producer-customer program,the parent process(producer) create many child process(consumer),then parent process read file and pass data to child process.

but , here comes a performance problem,pass message between process cost too much time (I think).

for an example ,a 200MB original data ,parent process read and pretreat will cost less then 8 seconds , than just pass data to child process by multiprocess.pipe will cost another 8 seconds , and child processes do the remain work just cost another 3 ~ 4 seconds.

so ,a complete work flow cost less than 18 seconds ,and more than 40% time cost on communication between process , it is much bigger than I used think about ,and I tried multiprocess.Queue and Manager ,they are worse.

I works with windows7 / Python3.4. I had google for several days , and POSH maybe a good solution , but it can't build with python3.4

there I have 3 ways:

1.is there any way can share python object direct between process in Python3.4 ? as POSH

or

2.is it possable pass the "pointer" of an object to child process and child process can recovery the "pointer" to python object?

or

3.multiprocess.Array may be a valid solution , but if I want share complex data structure, such as list, how it works? should I make a new class base on it and provide interfaces as list?

Edit1: I tried the 3rd way,but it works worse.
I defined those value:

p_pos = multiprocessing.Value('i') #producer write position  
c_pos = multiprocessing.Value('i') #customer read position  
databuff = multiprocess.Array('c',buff_len) # shared buffer

and two function:

send_data(msg)  
get_data()

in send_data function(parent process),it copies msg to databuff , and send the start and end position (two integer)to child process via pipe.
than in get_data function (child process) ,it received the two position and copy the msg from databuff.

in final,it cost twice than just use pipe @_@

Edit 2:
Yes , I tried Cython ,and the result looks good.
I just changed my python script's suffix to .pyx and compile it ,and the program speed up for 15%.
No doubt , I met the " Unable to find vcvarsall.bat" and " The system cannot find the file specified" error , and I cost whole day for solved the first one , and blocked by the second one.
Finally , I found Cyther , and all troubles gone ^_^.

like image 479
Allan Libra Avatar asked Sep 25 '16 13:09

Allan Libra


People also ask

How do I share data between two processes in Python?

Passing Messages to Processes A simple way to communicate between process with multiprocessing is to use a Queue to pass messages back and forth. Any pickle-able object can pass through a Queue. This short example only passes a single message to a single worker, then the main process waits for the worker to finish.

How do I join multiprocessing in Python?

You can join a process pool by calling join() on the pool after calling close() or terminate() in order to wait for all processes in the pool to be shutdown.

How do you communicate two processes in Python?

Every object has two methods – send() and recv(), to communicate between processes.


1 Answers

I was at your place five month ago. I looked around few times but my conclusion is multiprocessing with Python has exactly the problem you describe :

  • Pipes and Queue are good but not for big objects from my experience
  • Manager() proxies objects are slow except arrays and those one are limited. if you want to share a complex data structure use a Namespace like it is done here : multiprocessing in python - sharing large object (e.g. pandas dataframe) between multiple processes
  • Manager() has a shared list you are looking for : https://docs.python.org/3.6/library/multiprocessing.html
  • There are no pointers or real memory management in Python, so you can't share selected memory cells

I solved this kind of problem by learning C++, but it's probably not what you want to read...

like image 154
Jean-Baptiste F. Avatar answered Oct 18 '22 07:10

Jean-Baptiste F.