Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sampling rows in 2D numpy arrays with replacement

numpy.random.choice is a handy tool for sampling random elements from a 1D array:

In [94]: numpy.random.choice(numpy.arange(5), 10)
Out[94]: array([3, 1, 4, 3, 4, 3, 2, 4, 1, 1])

But the docs specify that a param must be one dimensional. But if I want to get a random selection of rows from a 2D array (for example, random samples for a one hot encoder), then numpy.random.choice cannot be used anymore.

So if my input is:

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])  

How can I get n rows in random fashion from this array, like this? (n = 10)

array([[ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.]])
like image 425
cs95 Avatar asked Feb 18 '26 10:02

cs95


2 Answers

As per this issue, the feature was considered in 2014, but no substantial additions have been made to the API since then. There is, however, a better solution that cleverly makes use of numpy.random.choice and numpy's fancy indexing:

Starting with

In [102]: x = numpy.eye(3); x
Out[102]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

You may use numpy.random.choice to generate a list of random indices, like this:

In [103]: i = numpy.random.choice(3, 10); i
Out[103]: array([2, 2, 0, 2, 1, 1, 2, 0, 0, 1])

Then use i to index x:

In [104]: x[i]
Out[104]: 
array([[ 0.,  0.,  1.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 0.,  1.,  0.]])

With a workaround this efficient, I don't believe a change to the API is necessary.

Do note that, for generating rows with a certain probability distribution, the procedure is the same. Specify a probability distribution on the indices itself.

like image 85
cs95 Avatar answered Feb 21 '26 14:02

cs95


Just to add another way of selecting rows from a 2-D array using the numpy.random.Generator.choice approach. Half-way through the page on the link below https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html it indicates that "sampling random rows from a 2-D array is . . . possible with Generator.choice through its axis keyword."

This approach works with pandas dataframe too. The only thing is that it changes dataframe to arrays after the sampling. Which you can easily convert back to dataframe.

Piggy-backing off what cs95 did, you could do the following:

x = np.eye(3); x

# numpy.random.Generator.choice
rng = np.random.default_rng()

y = rng.choice(a=x, size=10, replace=True, axis=0)
y

array([[0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.]])
like image 41
GSA Avatar answered Feb 21 '26 13:02

GSA