Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use multiprocessing for accelerate the following function?

I have the following for loop:

for j in range(len(a_nested_list_of_ints)):
    arr_1_, arr_2_, arr_3_ = foo(a_nested_list_of_ints[j])
    arr_1[j,:] = arr_1_.data.numpy()
    arr_2[j,:] = arr_2_.data.numpy()
    arr_3[j,:] = arr_3_.data.numpy()

Where a_nested_list_of_ints is a nested list of ints. However it is taking a lot of time to finish. How can I optimize it through multiprocessing? So far I tried to use multiprocessing

p = Pool(5)
for j in range(len(a_nested_list_of_ints)):
    arr_1_, arr_2_, arr_3_ = p.map(foo,a_nested_list_of_ints[j])
    arr_1[j,:] = arr_1_.data.numpy()
    arr_2[j,:] = arr_2_.data.numpy()
    arr_3[j,:] = arr_3_.data.numpy()

However, I am getting:

ValueError: not enough values to unpack (expected 3, got 2)

here:

    arr_1_, arr_2_, arr_3_ = p.map(foo,a_nested_list_of_ints[j])

Any idea of how to make the above operation faster? I also even tried with starmap but it aint working.

like image 806
anon Avatar asked May 15 '19 20:05

anon


People also ask

Does multiprocessing increase speed?

Multiprocessing can dramatically improve processing speed But this reduction isn't exactly proportionate to the number of processors available because of the overhead involved in creating multiprocessing processes, but the gains represent a significant improvement over single-core operations.

How do you use multiprocessing in Python?

In this example, at first we import the Process class then initiate Process object with the display() function. Then process is started with start() method and then complete the process with the join() method. We can also pass arguments to the function using args keyword.

What is the multiprocessing process?

Multiprocessing is a general term that can mean the dynamic assignment of a program to one of two or more computers working in tandem or can involve multiple computers working on the same program at the same time (in parallel).

What is multiprocessing Threadpool?

A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation. — multiprocessing — Process-based parallelism.


1 Answers

Here's a pool demo that works:

In [11]: def foo(i): 
    ...:     return np.arange(i), np.arange(10-i) 
    ...:                                                                        
In [12]: with multiprocessing.Pool(processes=2) as pool: 
    ...:     x = pool.map(foo, range(10)) 
    ...:                                                                        
In [13]: x                                                                      
Out[13]: 
[(array([], dtype=int64), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])),
 (array([0]), array([0, 1, 2, 3, 4, 5, 6, 7, 8])),
 (array([0, 1]), array([0, 1, 2, 3, 4, 5, 6, 7])),
 (array([0, 1, 2]), array([0, 1, 2, 3, 4, 5, 6])),
 (array([0, 1, 2, 3]), array([0, 1, 2, 3, 4, 5])),
 (array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4])),
 (array([0, 1, 2, 3, 4, 5]), array([0, 1, 2, 3])),
 (array([0, 1, 2, 3, 4, 5, 6]), array([0, 1, 2])),
 (array([0, 1, 2, 3, 4, 5, 6, 7]), array([0, 1])),
 (array([0, 1, 2, 3, 4, 5, 6, 7, 8]), array([0]))]

pool.map is doing the iteration, not some external for loop.

And to get a little closer to your example:

In [14]: def foo(alist): 
    ...:     return np.arange(*alist), np.zeros(alist,int) 
    ...:      
    ...:                                                                        
In [15]: alists=[(0,3),(1,4),(1,6,2)]                                           
In [16]: with multiprocessing.Pool(processes=2) as pool: 
    ...:     x = pool.map(foo, alists) 
    ...:                                                                        
In [17]: x                                                                      
Out[17]: 
[(array([0, 1, 2]), array([], shape=(0, 3), dtype=int64)),
 (array([1, 2, 3]), array([[0, 0, 0, 0]])),
 (array([1, 3, 5]), array([[[0, 0],
          [0, 0],
          [0, 0],
          [0, 0],
          [0, 0],
          [0, 0]]]))]

Note that pool.map returns a list, with all cases generated from alists. It doesn't make sense to unpack that x.

 x,y = pool.map(...)   # too many values to pack error

I can unpack the x using the zip* idiom:

In [21]: list(zip(*x))                                                          
Out[21]: 
[(array([0, 1, 2]), array([1, 2, 3]), array([1, 3, 5])),
 (array([], shape=(0, 3), dtype=int64), array([[0, 0, 0, 0]]), array([[[0, 0],
          [0, 0],
          [0, 0],
          [0, 0],
          [0, 0],
          [0, 0]]]))]

This a list of 2 tuples; in effect a list version of transpose. This can be unpacked:

In [23]: y,z = zip(*x)                                                          
In [24]: y                                                                      
Out[24]: (array([0, 1, 2]), array([1, 2, 3]), array([1, 3, 5]))
In [25]: z                                                                      
Out[25]: 
(array([], shape=(0, 3), dtype=int64), array([[0, 0, 0, 0]]), array([[[0, 0],
         [0, 0],
         [0, 0],
         [0, 0],
         [0, 0],
         [0, 0]]]))
like image 55
hpaulj Avatar answered Nov 03 '22 23:11

hpaulj