I am performing some large computations on 3 different numpy 2D arrays sequentially. The arrays are huge, 25000x25000 each. Each computation takes significant time so I decided to run 3 of them in parallel on 3 CPU cores on the server. I am following standard multiprocessing guideline and creating 2 processes and a worker function. Two computations are running through the 2 processes and the third one is running locally without separate process. I am passing the huge arrays as arguments of the processes like :
p1 = Process(target = Worker, args = (queue1, array1, ...)) # Some other params also going
p2 = Process(target = Worker, args = (queue2, array2, ...)) # Some other params also going
the Worker function sends back two numpy vectors (1D array) in a list appended in the queue like:
queue.put([v1, v2])
I am not using multiprocessing.pool
but surprisingly I am not getting speedup, it is actually running 3 times slower. Is passing large arrays taking time? I am unable to figure out what is going on. Should I use shared memory objects instead of passing arrays?
I shall be thankful if anybody can help.
Thank you.
my problem appears to be resolved. I was using a django module from inside which I was calling multiprocessing.pool.map_async. My worker function was a function inside the class itself. That was the problem. Multiprocessesing cannot call a function of the same class inside another process because subprocesses do not share memory. So inside the subprocess there is no live instance of the class. Probably that is why it is not getting called. As far as I understood. I removed the function from the class and put it in the same file but outside of the class, just before the class definition starts. It worked. I got moderate speedup also. And One more thing is people who are facing the same problem please do not read large arrays and pass between processes. Pickling and Unpickling would take a lot of time and you won't get speed up rather speed down. Try to read arrays inside the subprocess itself.
And if possible please use numpy.memmap arrays, they are quite fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With