Here I create a producer-customer program,the parent process(producer) create many child process(consumer),then parent process read file and pass data to child process.
but , here comes a performance problem,pass message between process cost too much time (I think).
for an example ,a 200MB original data ,parent process read and pretreat will cost less then 8 seconds , than just pass data to child process by multiprocess.pipe will cost another 8 seconds , and child processes do the remain work just cost another 3 ~ 4 seconds.
so ,a complete work flow cost less than 18 seconds ,and more than 40% time cost on communication between process , it is much bigger than I used think about ,and I tried multiprocess.Queue and Manager ,they are worse.
I works with windows7 / Python3.4. I had google for several days , and POSH maybe a good solution , but it can't build with python3.4
there I have 3 ways:
1.is there any way can share python object direct between process in Python3.4 ? as POSH
or
2.is it possable pass the "pointer" of an object to child process and child process can recovery the "pointer" to python object?
or
3.multiprocess.Array may be a valid solution , but if I want share complex data structure, such as list, how it works? should I make a new class base on it and provide interfaces as list?
Edit1:
I tried the 3rd way,but it works worse.
I defined those value:
p_pos = multiprocessing.Value('i') #producer write position
c_pos = multiprocessing.Value('i') #customer read position
databuff = multiprocess.Array('c',buff_len) # shared buffer
and two function:
send_data(msg)
get_data()
in send_data function(parent process),it copies msg to databuff , and send the start and end position (two integer)to child process via pipe.
than in get_data function (child process) ,it received the two position and copy the msg from databuff.
in final,it cost twice than just use pipe @_@
Edit 2:
Yes , I tried Cython ,and the result looks good.
I just changed my python script's suffix to .pyx and compile it ,and the program speed up for 15%.
No doubt , I met the " Unable to find vcvarsall.bat" and " The system cannot find the file specified" error , and I cost whole day for solved the first one , and blocked by the second one.
Finally , I found Cyther , and all troubles gone ^_^.
Passing Messages to Processes A simple way to communicate between process with multiprocessing is to use a Queue to pass messages back and forth. Any pickle-able object can pass through a Queue. This short example only passes a single message to a single worker, then the main process waits for the worker to finish.
You can join a process pool by calling join() on the pool after calling close() or terminate() in order to wait for all processes in the pool to be shutdown.
Every object has two methods – send() and recv(), to communicate between processes.
I was at your place five month ago. I looked around few times but my conclusion is multiprocessing with Python has exactly the problem you describe :
I solved this kind of problem by learning C++, but it's probably not what you want to read...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With