randomly choose N elements from each row of 2d array in python

Question

I have a 2d array, say, a = [ [1, 2, 3, 4], [5, 6,7, 8], [9, 10, 11, 12], ...[21, 22, 23, 24] ], and I would like to pick N elements from each row randomly, based on a probability distribution p which can be different for each row.

So basically, I'd like to do something like [ np.random.choice(a[i], N, p=p_arr[i]) for i in range(a.shape[0]) ] without using a loop, where p_arr is a 2d array the same shape as a. p_arr stores the probability distribution for each row in a.

The reason I want to avoid using a for loop is because running a line profiler shows that the loop is slowing down my code by a lot (I have large arrays to work with).

Is there a more python-ic way of doing this?

I checked out these links (here and here) but they don't answer my question.

Thank you!

An example of what I'd like to do without the loop:

a = np.ones([500, 500])

>>> p_arr = np.identity(a.shape[0])

>>> for i in range(a.shape[0]):

... a[i] = a[i]*np.arange(a.shape[0])

...

>>> [print(np.random.choice(a[i], p =p_arr[i])) for i in range(a.shape[0])]

Alain T. · Accepted Answer

Perhaps using a list comprehension instead of a loop would address the issue:

import numpy as np

shape = (10,10)
N     = 4
distributions = np.random.rand(*shape)
distributions = distributions/(np.sum(distributions,axis=1)[:,None])
values        = np.arange(shape[0]*shape[1]).reshape(shape)

sample        = np.array([np.random.choice(v,N,p=r) for v,r in zip(values,distributions)])

output:

print(np.round(distributions,2))
[[0.03 0.22 0.1  0.09 0.2  0.1  0.11 0.05 0.08 0.01]
 [0.04 0.12 0.13 0.03 0.16 0.22 0.16 0.05 0.   0.09]
 [0.15 0.04 0.08 0.07 0.17 0.13 0.01 0.15 0.1  0.1 ]
 [0.06 0.13 0.16 0.03 0.17 0.09 0.08 0.11 0.05 0.12]
 [0.07 0.08 0.09 0.08 0.13 0.18 0.12 0.13 0.07 0.07]
 [0.1  0.04 0.11 0.06 0.04 0.16 0.18 0.15 0.01 0.15]
 [0.06 0.09 0.17 0.08 0.14 0.15 0.09 0.01 0.06 0.15]
 [0.03 0.1  0.11 0.07 0.14 0.14 0.15 0.1  0.04 0.11]
 [0.05 0.1  0.18 0.1  0.03 0.18 0.12 0.05 0.05 0.13]
 [0.13 0.1  0.08 0.11 0.06 0.14 0.11 0.   0.14 0.14]]

print(sample)
[[ 6  4  8  5]
 [16 19 15 10]
 [25 20 24 23]
 [37 34 30 31]
 [41 44 46 45]
 [59 55 53 57]
 [64 63 65 61]
 [79 75 76 77]
 [85 81 83 88]
 [99 96 93 90]]

If you want non-repeating samples on each line, there is another kind of optimization that you could try. By flattening the values and the distributions, you can create a non-repeating shuffle of indexes of the whole matrix according to the respective distributions of each line. With the flattened distributions, each group of values that belong to the same line will have (as a group) an equivalent distribution. This means that, if you reassemble the shuffled indexes on their original lines but keeping their stable shuffled ordered, you can then take a slice of the shuffle matrix to obtain your sample:

flatDist    = distributions.reshape((distributions.size,))
flatDist    = flatDist/np.sum(flatDist)
randomIdx   = np.random.choice(np.arange(values.size),flatDist.size,replace=False,p=flatDist)
shuffleIdx  = np.array([randomIdx//shape[1],randomIdx%shape[1]])
shuffleIdx  = shuffleIdx[:,np.argsort(shuffleIdx[0,:],kind="stable")]
sample      = values[tuple(shuffleIdx)].reshape(shape)[:,:N]

output:

print(sample)
[[ 3  7  2  5]
 [13 12 14 16]
 [27 23 25 29]
 [37 31 33 36]
 [47 45 48 49]
 [59 50 52 54]
 [62 61 60 66]
 [72 78 70 77]
 [87 82 83 86]
 [92 98 95 93]]

randomly choose N elements from each row of 2d array in python

Tags:

performance

python

numpy-random

strugglinggrad

1 Answers

Alain T.

Recent Activity

Donate For Us

randomly choose N elements from each row of 2d array in python

Tags:

performance

python

numpy-random

strugglinggrad

1 Answers

Alain T.

Related questions

Recent Activity

Donate For Us