I have the following for loop:
for j in range(len(a_nested_list_of_ints)):
arr_1_, arr_2_, arr_3_ = foo(a_nested_list_of_ints[j])
arr_1[j,:] = arr_1_.data.numpy()
arr_2[j,:] = arr_2_.data.numpy()
arr_3[j,:] = arr_3_.data.numpy()
Where a_nested_list_of_ints
is a nested list of ints. However it is taking a lot of time to finish. How can I optimize it through multiprocessing? So far I tried to use multiprocessing
p = Pool(5)
for j in range(len(a_nested_list_of_ints)):
arr_1_, arr_2_, arr_3_ = p.map(foo,a_nested_list_of_ints[j])
arr_1[j,:] = arr_1_.data.numpy()
arr_2[j,:] = arr_2_.data.numpy()
arr_3[j,:] = arr_3_.data.numpy()
However, I am getting:
ValueError: not enough values to unpack (expected 3, got 2)
here:
arr_1_, arr_2_, arr_3_ = p.map(foo,a_nested_list_of_ints[j])
Any idea of how to make the above operation faster? I also even tried with starmap but it aint working.
Multiprocessing can dramatically improve processing speed But this reduction isn't exactly proportionate to the number of processors available because of the overhead involved in creating multiprocessing processes, but the gains represent a significant improvement over single-core operations.
In this example, at first we import the Process class then initiate Process object with the display() function. Then process is started with start() method and then complete the process with the join() method. We can also pass arguments to the function using args keyword.
Multiprocessing is a general term that can mean the dynamic assignment of a program to one of two or more computers working in tandem or can involve multiple computers working on the same program at the same time (in parallel).
A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation. — multiprocessing — Process-based parallelism.
Here's a pool
demo that works:
In [11]: def foo(i):
...: return np.arange(i), np.arange(10-i)
...:
In [12]: with multiprocessing.Pool(processes=2) as pool:
...: x = pool.map(foo, range(10))
...:
In [13]: x
Out[13]:
[(array([], dtype=int64), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])),
(array([0]), array([0, 1, 2, 3, 4, 5, 6, 7, 8])),
(array([0, 1]), array([0, 1, 2, 3, 4, 5, 6, 7])),
(array([0, 1, 2]), array([0, 1, 2, 3, 4, 5, 6])),
(array([0, 1, 2, 3]), array([0, 1, 2, 3, 4, 5])),
(array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4])),
(array([0, 1, 2, 3, 4, 5]), array([0, 1, 2, 3])),
(array([0, 1, 2, 3, 4, 5, 6]), array([0, 1, 2])),
(array([0, 1, 2, 3, 4, 5, 6, 7]), array([0, 1])),
(array([0, 1, 2, 3, 4, 5, 6, 7, 8]), array([0]))]
pool.map
is doing the iteration, not some external for
loop.
And to get a little closer to your example:
In [14]: def foo(alist):
...: return np.arange(*alist), np.zeros(alist,int)
...:
...:
In [15]: alists=[(0,3),(1,4),(1,6,2)]
In [16]: with multiprocessing.Pool(processes=2) as pool:
...: x = pool.map(foo, alists)
...:
In [17]: x
Out[17]:
[(array([0, 1, 2]), array([], shape=(0, 3), dtype=int64)),
(array([1, 2, 3]), array([[0, 0, 0, 0]])),
(array([1, 3, 5]), array([[[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0]]]))]
Note that pool.map
returns a list, with all cases generated from alists
. It doesn't make sense to unpack that x
.
x,y = pool.map(...) # too many values to pack error
I can unpack the x
using the zip*
idiom:
In [21]: list(zip(*x))
Out[21]:
[(array([0, 1, 2]), array([1, 2, 3]), array([1, 3, 5])),
(array([], shape=(0, 3), dtype=int64), array([[0, 0, 0, 0]]), array([[[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0]]]))]
This a list of 2 tuples; in effect a list version of transpose. This can be unpacked:
In [23]: y,z = zip(*x)
In [24]: y
Out[24]: (array([0, 1, 2]), array([1, 2, 3]), array([1, 3, 5]))
In [25]: z
Out[25]:
(array([], shape=(0, 3), dtype=int64), array([[0, 0, 0, 0]]), array([[[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0]]]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With