Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shared Non-Contiguous-Access Numpy Array

I have a numpy array that I would like to share between a bunch of python processes in a way that doesn't involve copies. I create a shared numpy array from an existing numpy array using the sharedmem package.

import sharedmem as shm
def convert_to_shared_array(A):
    shared_array = shm.shared_empty(A.shape, A.dtype, order="C")
    shared_array[...] = A
    return shared_array

My problem is that each subprocess needs to access rows that are randomly distributed in the array. Currently I create a shared numpy array using the sharedmem package and pass it to each subprocess. Each process also has a list, idx, of rows that it needs to access. The problem is in the subprocess when I do:

#idx = list of randomly distributed integers

local_array = shared_array[idx,:]

# Do stuff with local array

It creates a copy of the array instead of just another view. The array is quite large and manipulating it first before shareing it so that each process accesses a contiguous range of rows like

local_array = shared_array[start:stop,:]

takes too long.

Question: What are good solutions for sharing random access to a numpy array between python processes that don't involve copying the array?

The subprocesses need readonly access (so no need for locking on access).

like image 756
Anthony Bak Avatar asked Nov 13 '22 12:11

Anthony Bak


1 Answers

Fancy indexing induces a copy, so you need to avoid fancy indexing if you want to avoid copies there is no way around it.

like image 130
David Cournapeau Avatar answered Dec 18 '22 07:12

David Cournapeau