Numpy random choice to produce a 2D-array with all unique values

Tags:

So I am wondering if there's a more efficient solution in generating a 2-D array using np.random.choice where each row has unique values.

For example, for an array with shape (3,4), we expect an output of:

# Expected output given a shape (3,4)
array([[0, 1, 3, 2],
       [2, 3, 1, 0],
       [1, 3, 2, 0]])

This means that the values for each row must be unique with respect to the number of columns. So for each row in out, the integers should only fall between 0 to 3.

I know that I can achieve it by passing False to the replace argument. But I can only do it for each row and not for the whole matrix. For instance, I can do this:

>>> np.random.choice(4, size=(1,4), replace=False)
array([[0,2,3,1]])

But when I try to do this:

>>> np.random.choice(4, size=(3,4), replace=False)

I get an error like this:

 File "<stdin>", line 1, in <module>
 File "mtrand.pyx", line 1150, in mtrand.RandomState.choice 
 (numpy\random\mtrand\mtrand.c:18113)
 ValueError: Cannot take a larger sample than population when 
 'replace=False'

I assume it's because it's trying to draw 3 x 4 = 12 samples due to the size of the matrix without replacement but I'm only giving it a limit of 4.

I know that I can solve it by using a for-loop:

 >>> a = (np.random.choice(4,size=4,replace=False) for _ in range(3))
 >>> np.vstack(a)
 array([[3, 1, 2, 0],
        [1, 2, 0, 3],
        [2, 0, 3, 1]])

But I wanted to know if there's a workaround without using any for-loops? (I'm kinda assuming that adding for-loops might make it slower if I have a number of rows greater than 1000. But as you can see I am actually creating a generator in a so I'm also not sure if it has an effect after all.)

941

asked Aug 01 '17 12:08

Lj Miranda

1 Answers

One trick I have used often is generating a random array and using argsort to get unique indices as the required unique numbers. Thus, we could do -

def random_choice_noreplace(m,n, axis=-1):
    # m, n are the number of rows, cols of output
    return np.random.rand(m,n).argsort(axis=axis)

Sample runs -

In [98]: random_choice_noreplace(3,7)
Out[98]: 
array([[0, 4, 3, 2, 6, 5, 1],
       [5, 1, 4, 6, 0, 2, 3],
       [6, 1, 0, 4, 5, 3, 2]])

In [99]: random_choice_noreplace(5,7, axis=0) # unique nums along cols
Out[99]: 
array([[0, 2, 4, 4, 1, 0, 2],
       [1, 4, 3, 2, 4, 1, 3],
       [3, 1, 1, 3, 2, 3, 0],
       [2, 3, 0, 0, 0, 2, 4],
       [4, 0, 2, 1, 3, 4, 1]])

Runtime test -

# Original approach
def loopy_app(m,n):
    a = (np.random.choice(n,size=n,replace=False) for _ in range(m))
    return np.vstack(a)

Timings -

In [108]: %timeit loopy_app(1000,100)
10 loops, best of 3: 20.6 ms per loop

In [109]: %timeit random_choice_noreplace(1000,100)
100 loops, best of 3: 3.66 ms per loop

answered Oct 13 '22 19:10

Divakar

Related questions
                            
                                How to install GSSAPI Python module?
                            
                                Pandas DataFrame: How to natively get minimum across range of rows and columns
                            
                                logging: print message only once
                            
                                Python, get base64-encoded MD5 hash of an image object
                            
                                Cannot upgrade Pip. Permission denied. Why?
                            
                                Make a numpy upper triangular matrix padded with Nan instead of zero
                            
                                What does the power operator (**) in python translate into?
                            
                                How to make push button immediately disabled?
                            
                                Number list with no repeats and ordered
                            
                                Type Hinting for objects of type that's being defined [duplicate]
                            
                                Find unique pairs in list of pairs
                            
                                How can I implement a weighted cross entropy loss in tensorflow using sparse_softmax_cross_entropy_with_logits
                            
                                Pandas: Change a specific column name in dataframe having multilevel columns
                            
                                double square brackets side by side in python
                            
                                How to create dynamic plots to display on Flask?
                            
                                How to get the result of multiprocessing.Pool.apply_async
                            
                                How to read a csv file with multiple header rows into pandas? [closed]
                            
                                Intersection of two Counters
                            
                                Yolo Darknet Detecting Only Specific Class like Person, Cat, Dog etc
                            
                                Add column for percentage of total to Pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Numpy random choice to produce a 2D-array with all unique values

Tags:

python

arrays

numpy

Lj Miranda

People also ask

1 Answers

Divakar

Recent Activity

Donate For Us