Better way to shuffle two numpy arrays in unison

People also ask

How do I shuffle two NumPy arrays together?

Suppose we have two arrays of the same length or same leading dimensions, and we want to shuffle them both in a way that the corresponding elements in both arrays remain corresponding. In that case, we can use the shuffle() function inside the sklean. utils library in Python.

How do I shuffle two lists at once in Python?

Method : Using zip() + shuffle() + * operator In this method, this task is performed in three steps. Firstly, the lists are zipped together using zip(). Next step is to perform shuffle using inbuilt shuffle() and last step is to unzip the lists to separate lists using * operator.

How do I shuffle a NumPy array?

You can use numpy. random. shuffle() . This function only shuffles the array along the first axis of a multi-dimensional array.

What is a correct method to join two or more arrays in NumPy?

Use concatenate() to Join Two Arrays Use numpy. concatenate() to merge the content of two or multiple arrays into a single array. This function takes several arguments along with the NumPy arrays to concatenate and returns a Numpy array ndarray.

Your can use NumPy's array indexing:

def unison_shuffled_copies(a, b):
    assert len(a) == len(b)
    p = numpy.random.permutation(len(a))
    return a[p], b[p]

This will result in creation of separate unison-shuffled arrays.

X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y, random_state=0)

To learn more, see http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html

Your "scary" solution does not appear scary to me. Calling shuffle() for two sequences of the same length results in the same number of calls to the random number generator, and these are the only "random" elements in the shuffle algorithm. By resetting the state, you ensure that the calls to the random number generator will give the same results in the second call to shuffle(), so the whole algorithm will generate the same permutation.

If you don't like this, a different solution would be to store your data in one array instead of two right from the beginning, and create two views into this single array simulating the two arrays you have now. You can use the single array for shuffling and the views for all other purposes.

Example: Let's assume the arrays a and b look like this:

a = numpy.array([[[  0.,   1.,   2.],
                  [  3.,   4.,   5.]],

                 [[  6.,   7.,   8.],
                  [  9.,  10.,  11.]],

                 [[ 12.,  13.,  14.],
                  [ 15.,  16.,  17.]]])

b = numpy.array([[ 0.,  1.],
                 [ 2.,  3.],
                 [ 4.,  5.]])

We can now construct a single array containing all the data:

c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)]
# array([[  0.,   1.,   2.,   3.,   4.,   5.,   0.,   1.],
#        [  6.,   7.,   8.,   9.,  10.,  11.,   2.,   3.],
#        [ 12.,  13.,  14.,  15.,  16.,  17.,   4.,   5.]])

Now we create views simulating the original a and b:

a2 = c[:, :a.size//len(a)].reshape(a.shape)
b2 = c[:, a.size//len(a):].reshape(b.shape)

The data of a2 and b2 is shared with c. To shuffle both arrays simultaneously, use numpy.random.shuffle(c).

In production code, you would of course try to avoid creating the original a and b at all and right away create c, a2 and b2.

This solution could be adapted to the case that a and b have different dtypes.

Very simple solution:

randomize = np.arange(len(x))
np.random.shuffle(randomize)
x = x[randomize]
y = y[randomize]

the two arrays x,y are now both randomly shuffled in the same way

James wrote in 2015 an sklearn solution which is helpful. But he added a random state variable, which is not needed. In the below code, the random state from numpy is automatically assumed.

X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y)

from np.random import permutation
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data #numpy array
y = iris.target #numpy array

# Data is currently unshuffled; we should shuffle 
# each X[i] with its corresponding y[i]
perm = permutation(len(X))
X = X[perm]
y = y[perm]

Related questions
                            
                                How to have one colorbar for all subplots
                            
                                How do I integrate Ajax with Django applications?
                            
                                libxml install error using pip
                            
                                How do I profile memory usage in Python?
                            
                                How to read a single character from the user?
                            
                                How to select rows with one or more nulls from a pandas DataFrame without listing columns explicitly?
                            
                                Programmatically generate video or animated GIF in Python?
                            
                                How to exit from Python without traceback?
                            
                                Simplify Chained Comparison
                            
                                Wrapping a C library in Python: C, Cython or ctypes?
                            
                                Mapping over values in a python dictionary
                            
                                Split (explode) pandas dataframe string entry to separate rows
                            
                                Python - Count elements in list [duplicate]
                            
                                "for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte
                            
                                Python not working in the command line of git bash
                            
                                Relative imports - ModuleNotFoundError: No module named x
                            
                                How to print the value of a Tensor object in TensorFlow?
                            
                                Redirecting to URL in Flask
                            
                                How to check if a user is logged in (how to properly use user.is_authenticated)?
                            
                                Set variable in jinja

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Better way to shuffle two numpy arrays in unison

Tags:

python

random

shuffle

numpy

numpy-ndarray

People also ask

Recent Activity

Donate For Us