Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy random choice to produce a 2D-array with all unique values

So I am wondering if there's a more efficient solution in generating a 2-D array using np.random.choice where each row has unique values.

For example, for an array with shape (3,4), we expect an output of:

# Expected output given a shape (3,4)
array([[0, 1, 3, 2],
       [2, 3, 1, 0],
       [1, 3, 2, 0]])

This means that the values for each row must be unique with respect to the number of columns. So for each row in out, the integers should only fall between 0 to 3.

I know that I can achieve it by passing False to the replace argument. But I can only do it for each row and not for the whole matrix. For instance, I can do this:

>>> np.random.choice(4, size=(1,4), replace=False)
array([[0,2,3,1]])

But when I try to do this:

>>> np.random.choice(4, size=(3,4), replace=False)

I get an error like this:

 File "<stdin>", line 1, in <module>
 File "mtrand.pyx", line 1150, in mtrand.RandomState.choice 
 (numpy\random\mtrand\mtrand.c:18113)
 ValueError: Cannot take a larger sample than population when 
 'replace=False'

I assume it's because it's trying to draw 3 x 4 = 12 samples due to the size of the matrix without replacement but I'm only giving it a limit of 4.

I know that I can solve it by using a for-loop:

 >>> a = (np.random.choice(4,size=4,replace=False) for _ in range(3))
 >>> np.vstack(a)
 array([[3, 1, 2, 0],
        [1, 2, 0, 3],
        [2, 0, 3, 1]])

But I wanted to know if there's a workaround without using any for-loops? (I'm kinda assuming that adding for-loops might make it slower if I have a number of rows greater than 1000. But as you can see I am actually creating a generator in a so I'm also not sure if it has an effect after all.)

like image 941
Lj Miranda Avatar asked Aug 01 '17 12:08

Lj Miranda


People also ask

How do I randomly select from a NumPy array?

choice() to select random rows from a NumPy array. Use numpy. random. choice(a, size=k, replace=False) to generate a list of k random indices without repetition from a NumPy array with a rows.

What does NumPy random choice do?

NumPy random choice helps you create random samples. One common task in data analysis, statistics, and related fields is taking random samples of data. You'll see random samples in probability, Bayesian statistics, machine learning, and other subjects. Random samples are very common in data-related fields.

What is random rand () function in NumPy?

The numpy.random.rand() function creates an array of specified shape and fills it with random values. Syntax : numpy.random.rand(d0, d1, ..., dn) Parameters : d0, d1, ..., dn : [int, optional]Dimension of the returned array we require, If no argument is given a single Python float is returned.

What is NumPy random seed2?

NumPy random seed is simply a function that sets the random seed of the NumPy pseudo-random number generator. It provides an essential input that enables NumPy to generate pseudo-random numbers for random processes.


1 Answers

One trick I have used often is generating a random array and using argsort to get unique indices as the required unique numbers. Thus, we could do -

def random_choice_noreplace(m,n, axis=-1):
    # m, n are the number of rows, cols of output
    return np.random.rand(m,n).argsort(axis=axis)

Sample runs -

In [98]: random_choice_noreplace(3,7)
Out[98]: 
array([[0, 4, 3, 2, 6, 5, 1],
       [5, 1, 4, 6, 0, 2, 3],
       [6, 1, 0, 4, 5, 3, 2]])

In [99]: random_choice_noreplace(5,7, axis=0) # unique nums along cols
Out[99]: 
array([[0, 2, 4, 4, 1, 0, 2],
       [1, 4, 3, 2, 4, 1, 3],
       [3, 1, 1, 3, 2, 3, 0],
       [2, 3, 0, 0, 0, 2, 4],
       [4, 0, 2, 1, 3, 4, 1]])

Runtime test -

# Original approach
def loopy_app(m,n):
    a = (np.random.choice(n,size=n,replace=False) for _ in range(m))
    return np.vstack(a)

Timings -

In [108]: %timeit loopy_app(1000,100)
10 loops, best of 3: 20.6 ms per loop

In [109]: %timeit random_choice_noreplace(1000,100)
100 loops, best of 3: 3.66 ms per loop
like image 63
Divakar Avatar answered Oct 13 '22 19:10

Divakar