Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently filling NumPy array using lists of indices

I know how to execute a parallel loop in joblib that returns a list as result.

However, is it possible to fill a predefined numpy matrix in parallel?

Imagine the following minimal example matrix and data:

column_data = ['a', 'b', 'c', 'd', 'e', 'f', 'x']
data = [['a', 'b', 'c'],
        ['d', 'c'],
        ['e', 'f', 'd', 'x']]
x = np.zeros((len(data), len(column_data))

Note that column_data is sorted and unique. data is a list of lists, not a rectangular matrix.

The loop:

for row in range(len(data)):
    for column in data[row]:
        x[row][column_data.index(column)] = 1

It is possible to parallellise this loop? Filling in a 70,000 x 10,000 matrix is quite slow without parallellisation.

like image 545
Tim Avatar asked Feb 19 '26 19:02

Tim


1 Answers

Here's an almost vectorized approach -

lens = [len(item) for item in data]    
A = np.concatenate((column_data,np.concatenate(data)))
_,idx = np.unique(A,return_inverse=True)

R = np.repeat(np.arange(len(lens)),lens)
C = idx[len(column_data):]

out = np.zeros((len(data), len(column_data)))    
out[R,C] = 1

Here's another -

lens = [len(item) for item in data]
R = np.repeat(np.arange(len(lens)),lens)
C = np.searchsorted(column_data,np.concatenate(data))

out = np.zeros((len(data), len(column_data)))
out[R,C] = 1
like image 65
Divakar Avatar answered Feb 21 '26 08:02

Divakar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!