So I am wondering if there's a more efficient solution in generating a 2-D array using np.random.choice
where each row has unique values.
For example, for an array with shape (3,4)
, we expect an output of:
# Expected output given a shape (3,4)
array([[0, 1, 3, 2],
[2, 3, 1, 0],
[1, 3, 2, 0]])
This means that the values for each row must be unique with respect to the number of columns. So for each row in out
, the integers should only fall between 0 to 3.
I know that I can achieve it by passing False
to the replace
argument. But I can only do it for each row and not for the whole matrix. For instance, I can do this:
>>> np.random.choice(4, size=(1,4), replace=False)
array([[0,2,3,1]])
But when I try to do this:
>>> np.random.choice(4, size=(3,4), replace=False)
I get an error like this:
File "<stdin>", line 1, in <module>
File "mtrand.pyx", line 1150, in mtrand.RandomState.choice
(numpy\random\mtrand\mtrand.c:18113)
ValueError: Cannot take a larger sample than population when
'replace=False'
I assume it's because it's trying to draw 3 x 4 = 12
samples due to the size of the matrix without replacement but I'm only giving it a limit of 4
.
I know that I can solve it by using a for-loop
:
>>> a = (np.random.choice(4,size=4,replace=False) for _ in range(3))
>>> np.vstack(a)
array([[3, 1, 2, 0],
[1, 2, 0, 3],
[2, 0, 3, 1]])
But I wanted to know if there's a workaround without using any for-loops? (I'm kinda assuming that adding for-loops might make it slower if I have a number of rows greater than 1000. But as you can see I am actually creating a generator in a
so I'm also not sure if it has an effect after all.)
choice() to select random rows from a NumPy array. Use numpy. random. choice(a, size=k, replace=False) to generate a list of k random indices without repetition from a NumPy array with a rows.
NumPy random choice helps you create random samples. One common task in data analysis, statistics, and related fields is taking random samples of data. You'll see random samples in probability, Bayesian statistics, machine learning, and other subjects. Random samples are very common in data-related fields.
The numpy.random.rand() function creates an array of specified shape and fills it with random values. Syntax : numpy.random.rand(d0, d1, ..., dn) Parameters : d0, d1, ..., dn : [int, optional]Dimension of the returned array we require, If no argument is given a single Python float is returned.
NumPy random seed is simply a function that sets the random seed of the NumPy pseudo-random number generator. It provides an essential input that enables NumPy to generate pseudo-random numbers for random processes.
One trick I have used often is generating a random array and using argsort
to get unique indices as the required unique numbers. Thus, we could do -
def random_choice_noreplace(m,n, axis=-1):
# m, n are the number of rows, cols of output
return np.random.rand(m,n).argsort(axis=axis)
Sample runs -
In [98]: random_choice_noreplace(3,7)
Out[98]:
array([[0, 4, 3, 2, 6, 5, 1],
[5, 1, 4, 6, 0, 2, 3],
[6, 1, 0, 4, 5, 3, 2]])
In [99]: random_choice_noreplace(5,7, axis=0) # unique nums along cols
Out[99]:
array([[0, 2, 4, 4, 1, 0, 2],
[1, 4, 3, 2, 4, 1, 3],
[3, 1, 1, 3, 2, 3, 0],
[2, 3, 0, 0, 0, 2, 4],
[4, 0, 2, 1, 3, 4, 1]])
Runtime test -
# Original approach
def loopy_app(m,n):
a = (np.random.choice(n,size=n,replace=False) for _ in range(m))
return np.vstack(a)
Timings -
In [108]: %timeit loopy_app(1000,100)
10 loops, best of 3: 20.6 ms per loop
In [109]: %timeit random_choice_noreplace(1000,100)
100 loops, best of 3: 3.66 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With