Numpy: Get random set of rows from 2D array

People also ask

How do I randomly select rows from NumPy array?

The shuffle() function shuffles the rows of an array randomly and then we will display a random row of the 2D array.

How do you randomly sample a matrix in python?

You can use random. randint() and random. randrange() to generate the random numbers, but it can repeat the numbers. To create a list of unique random numbers, we need to use the sample() method.

What is random rand () function in NumPy?

numpy. random. rand() function is used to generate random float values from an uniform distribution over [0,1) . These values can be extracted as a single value or in arrays of any dimension. In this article, you will learn about various use cases of this function.

>>> A = np.random.randint(5, size=(10,3))
>>> A
array([[1, 3, 0],
       [3, 2, 0],
       [0, 2, 1],
       [1, 1, 4],
       [3, 2, 2],
       [0, 1, 0],
       [1, 3, 1],
       [0, 4, 1],
       [2, 4, 2],
       [3, 3, 1]])
>>> idx = np.random.randint(10, size=2)
>>> idx
array([7, 6])
>>> A[idx,:]
array([[0, 4, 1],
       [1, 3, 1]])

Putting it together for a general case:

A[np.random.randint(A.shape[0], size=2), :]

For non replacement (numpy 1.7.0+):

A[np.random.choice(A.shape[0], 2, replace=False), :]

I do not believe there is a good way to generate random list without replacement before 1.7. Perhaps you can setup a small definition that ensures the two values are not the same.

This is an old post, but this is what works best for me:

A[np.random.choice(A.shape[0], num_rows_2_sample, replace=False)]

change the replace=False to True to get the same thing, but with replacement.

Another option is to create a random mask if you just want to down-sample your data by a certain factor. Say I want to down-sample to 25% of my original data set, which is currently held in the array data_arr:

# generate random boolean mask the length of data
# use p 0.75 for False and 0.25 for True
mask = numpy.random.choice([False, True], len(data_arr), p=[0.75, 0.25])

Now you can call data_arr[mask] and return ~25% of the rows, randomly sampled.

This is a similar answer to the one Hezi Rasheff provided, but simplified so newer python users understand what's going on (I noticed many new datascience students fetch random samples in the weirdest ways because they don't know what they are doing in python).

You can get a number of random indices from your array by using:

indices = np.random.choice(A.shape[0], number_of_samples, replace=False)

You can then use fancy indexing with your numpy array to get the samples at those indices:

A[indices]

This will get you the specified number of random samples from your data.

I see permutation has been suggested. In fact it can be made into one line:

>>> A = np.random.randint(5, size=(10,3))
>>> np.random.permutation(A)[:2]

array([[0, 3, 0],
       [3, 1, 2]])

If you need the same rows but just a random sample then,

import random
new_array = random.sample(old_array,x)

Here x, has to be an 'int' defining the number of rows you want to randomly pick.

If you want to generate multiple random subsets of rows, for example if your doing RANSAC.

num_pop = 10
num_samples = 2
pop_in_sample = 3
rows_to_sample = np.random.random([num_pop, 5])
random_numbers = np.random.random([num_samples, num_pop])
samples = np.argsort(random_numbers, axis=1)[:, :pop_in_sample]
# will be shape [num_samples, pop_in_sample, 5]
row_subsets = rows_to_sample[samples, :]

Related questions
                            
                                multiprocessing vs multithreading vs asyncio in Python 3
                            
                                How to add a new row to an empty numpy array
                            
                                python pandas: apply a function with arguments to a series
                            
                                Why is early return slower than else?
                            
                                How to access the local Django webserver from outside world
                            
                                Setting different color for each series in scatter plot on matplotlib
                            
                                Iterating through directories with Python
                            
                                How can I one hot encode in Python?
                            
                                How to do multiple arguments to map function where one remains the same in python?
                            
                                Pandas dataframe get first row of each group
                            
                                Python - Get path of root project structure
                            
                                Python - Extracting and Saving Video Frames
                            
                                Print list without brackets in a single row
                            
                                How can I filter a date of a DateTimeField in Django?
                            
                                Get MD5 hash of big files in Python
                            
                                How do I get a list of all the duplicate items using pandas in python?
                            
                                psycopg2: insert multiple rows with one query
                            
                                How to split a dos path into its components in Python
                            
                                Python in Xcode 4+?
                            
                                RuntimeError on windows trying python multiprocessing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Numpy: Get random set of rows from 2D array

Tags:

python

numpy

People also ask

Recent Activity

Donate For Us