Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python random sample of two arrays, but matching indices

I have two numpy arrays x and y, which have length 10,000. I would like to plot a random subset of 1,000 entries of both x and y. Is there an easy way to use the lovely, compact random.sample(population, k) on both x and y to select the same corresponding indices? (The y and x vectors are linked by a function y(x) say.)

Thanks.

like image 992
Cokes Avatar asked Oct 21 '13 03:10

Cokes


People also ask

How do you shuffle two NP arrays the same way?

To shuffle both arrays simultaneously, use numpy. random. shuffle(c) .

How do I shuffle two arrays together in python?

shuffle() Function in Python. Suppose we have two arrays of the same length or same leading dimensions, and we want to shuffle them both in a way that the corresponding elements in both arrays remain corresponding. In that case, we can use the shuffle() function inside the sklean. utils library in Python.

How do you randomly sample an array in python?

Use the numpy. random. choice() function to pick multiple random rows from the multidimensional array.

How can you tell if two arrays are identical NumPy?

To check if two NumPy arrays A and B are equal: Use a comparison operator (==) to form a comparison array. Check if all the elements in the comparison array are True.


4 Answers

You can use np.random.choice on an index array and apply it to both arrays:

idx = np.random.choice(np.arange(len(x)), 1000, replace=False)
x_sample = x[idx]
y_sample = y[idx]
like image 124
Jaime Avatar answered Oct 20 '22 02:10

Jaime


Just zip the two together and use that as the population:

import random

random.sample(zip(xs,ys), 1000)

The result will be 1000 pairs (2-tuples) of corresponding entries from xs and ys.

like image 44
DaoWen Avatar answered Oct 20 '22 01:10

DaoWen


After test numpy.random.choice solution, I found out it was very slow for larger array.

numpy.random.randint should be much faster

example

x = np.arange(1e8)
y = np.arange(1e8)
idx = np.random.randint(0, x.shape[0], 10000)
return x[idx], y[idx]
like image 1
StoneLin Avatar answered Oct 20 '22 01:10

StoneLin


Using the numpy.random.randint function, you generate a list of random numbers, meaning that you can select certain datapoints twice.

like image 1
bananenpampe Avatar answered Oct 20 '22 01:10

bananenpampe