populate numpy array through concurrent.futures multiprocessing

Question

I'm seeking to populate a large numpy array using multiprocessing. I've worked through the concurrent futures examples in the documentation but haven't obtained enough of an understanding to modify the usage.

Here's a simplified version of what I'd like to do:

import numpy
import concurrent.futures

squares = numpy.empty((20, 2))

def make_square(i, squares):
    print('iteration', i)
    squares[i, 0], squares[i, 1] = i, i ** 2

with concurrent.futures.ProcessPoolExecutor(2) as executor: 
    for i in range(20):
        executor.submit(make_square, i, squares)

The output runs something like:

iteration 1
iteration 0
iteration 2
iteration 3
iteration 5
iteration 4
iteration 6
iteration 7
iteration 8
iteration 9
iteration 10
iteration 11
iteration 12
iteration 13
iteration 15
iteration 14
iteration 16
iteration 17
iteration 18
iteration 19

which nicely demonstrates that the function is running concurrently. But the squares array is still empty.

What is the correct syntax to populate the squares array?

Secondly, would using .map be a better implementation?

Thanks in advance!

8/2/17 Wow. So I wandered into reddit-land because I wan't getting any takers for this problem. So happy to be back here at stackoverflow. Thanks @ilia w495 nikitin and @donkopotamus. Here's what I posted in reddit which explains the background to this problem in more detail.

The posted code is an analogy of what I'm trying to do, which is populating 
a numpy array with a relatively simple calculation (dot product) involving 
two other arrays. The algorithm depends on a value N which can be anything 
from 1 on up, though we won't likely use a value larger than 24.

I'm currently running the algorithm on a distributed computing system and  
the N = 20 versions take longer than 10 days to complete. I'm using dozens 
of cores to obtain the required memory, but gaining none of the benefits of 
multiple CPUs. I've rewritten the code using numba which makes lower N 
variants superfast on my own laptop which can't handle the memory 
requirements for larger Ns, but alas, our distributed computing environment 
is not currently able to install numba. So I'm attempting concurrent.futures 
to take advantage of the multiple CPUs in our computing environment in the 
hopes of speeding things up.

So it's not the computation that is time intensive, it's the 16 million + iterations. The initialized array is N x 2 ** N, ie range(16777216) in the above code.

It may be that it's simply not possible to populate an array through multiprocessing.

donkopotamus · Accepted Answer

The issue here is that a ProcessPoolExecutor will execute a function within a separate process.

As these are separate processes, with a separate memory space, you cannot expect that any changes they make to your array (squares) will be reflected in the parent. Hence your original array is unchanged (as you are discovering).

You need to do either of the following:

use a ThreadPoolExecutor, but be aware in the general case, you still should not try and modify the global variables within multiple threads;
recast your code to have your process/thread do some kind of (expensive) computation and return the result.

The latter would look like this:

squares = numpy.zeros((20, 2))

def make_square(i):
    print('iteration', i)

    # compute expensive data here ...

    # return row number and the computed data
    return i, ([i, i**2])

with concurrent.futures.ProcessPoolExecutor(2) as executor: 
    for row, result in executor.map(make_square, range(20)):
        squares[row] = result

This will produce the result you expect:

[[   0.    0.]
 [   1.    1.]
 [   2.    4.]
 ...
 [  18.  324.]
 [  19.  361.]]

populate numpy array through concurrent.futures multiprocessing

Tags:

python-3.x

numpy

concurrent.futures

zazizoma

1 Answers

donkopotamus

Recent Activity

Donate For Us

populate numpy array through concurrent.futures multiprocessing

Tags:

python-3.x

numpy

concurrent.futures

zazizoma

1 Answers

donkopotamus

Related questions

Recent Activity

Donate For Us