I have two numpy arrays x and y, which have length 10,000. I would like to plot a random subset of 1,000 entries of both x and y. Is there an easy way to use the lovely, compact random.sample(population, k) on both x and y to select the same corresponding indices? (The y and x vectors are linked by a function y(x) say.)
Thanks.
To shuffle both arrays simultaneously, use numpy. random. shuffle(c) .
shuffle() Function in Python. Suppose we have two arrays of the same length or same leading dimensions, and we want to shuffle them both in a way that the corresponding elements in both arrays remain corresponding. In that case, we can use the shuffle() function inside the sklean. utils library in Python.
Use the numpy. random. choice() function to pick multiple random rows from the multidimensional array.
To check if two NumPy arrays A and B are equal: Use a comparison operator (==) to form a comparison array. Check if all the elements in the comparison array are True.
You can use np.random.choice
on an index array and apply it to both arrays:
idx = np.random.choice(np.arange(len(x)), 1000, replace=False)
x_sample = x[idx]
y_sample = y[idx]
Just zip the two together and use that as the population:
import random
random.sample(zip(xs,ys), 1000)
The result will be 1000 pairs (2-tuples) of corresponding entries from xs
and ys
.
After test numpy.random.choice
solution,
I found out it was very slow for larger array.
numpy.random.randint
should be much faster
example
x = np.arange(1e8)
y = np.arange(1e8)
idx = np.random.randint(0, x.shape[0], 10000)
return x[idx], y[idx]
Using the numpy.random.randint
function, you generate a list of random numbers, meaning that you can select certain datapoints twice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With