Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing and shared numpy array

I have a problem, which is similar to this:

import numpy as np

C = np.zeros((100,10))

for i in range(10):
    C_sub = get_sub_matrix_C(i, other_args) # shape 10x10
    C[i*10:(i+1)*10,:10] = C_sub

So, apparently there is no need to run this as a serial calculation, since each submatrix can be calculated independently. I would like to use the multiprocessing module and create up to 4 processes for the for loop. I read some tutorials about multiprocessing, but wasn't able to figure out how to use this to solve my problem.

Thanks for your help

like image 262
RoSt Avatar asked Sep 25 '22 03:09

RoSt


1 Answers

A simple way to parallelize that code would be to use a Pool of processes:

pool = multiprocessing.Pool()
results = pool.starmap(get_sub_matrix_C, ((i, other_args) for i in range(10)))

for i, res in enumerate(results):
    C[i*10:(i+1)*10,:10] = res

I've used starmap since the get_sub_matrix_C function has more than one argument (starmap(f, [(x1, ..., xN)]) calls f(x1, ..., xN)).

Note however that serialization/deserialization may take significant time and space, so you may have to use a more low-level solution to avoid that overhead.


It looks like you are running an outdated version of python. You can replace starmap with plain map but then you have to provide a function that takes a single parameter:

def f(args):
    return get_sub_matrix_C(*args)

pool = multiprocessing.Pool()
results = pool.map(f, ((i, other_args) for i in range(10)))

for i, res in enumerate(results):
    C[i*10:(i+1)*10,:10] = res
like image 110
Bakuriu Avatar answered Oct 11 '22 15:10

Bakuriu