Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to permute contents of each column in numpy

What's the best way to efficiently permute the contents of each column in a numpy array?

What I have is something like:

>>> arr = np.arange(16).reshape((4, 4))
>>> arr
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

>> # Shuffle each column independently to obtain something like
array([[  8,  5, 10,  7],
       [ 12,  1,  6,  3],
       [  4,  9, 14, 11],
       [  0, 13,  2, 15]])
like image 666
nopper Avatar asked Dec 15 '14 14:12

nopper


People also ask

How do I shuffle data in Numpy?

You can use numpy. random. shuffle() . This function only shuffles the array along the first axis of a multi-dimensional array.

What is permutation Numpy?

numpy.random. permutation (x) Randomly permute a sequence, or return a permuted range. If x is a multi-dimensional array, it is only shuffled along its first index.

How do you generate random permutations in Python?

To generate random Permutation in Python, then you can use the np random permutation. If the provided parameter is a multi-dimensional array, it is only shuffled along with its first index. If the parameter is an integer, randomly permute np.


2 Answers

If your array is multi-dimensional, np.random.permutation permutes along the first axis (columns) by default:

>>> np.random.permutation(arr)
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [ 0,  1,  2,  3],
       [12, 13, 14, 15]])

However, this shuffles the row indices and so each column has the same (random) ordering.

The simplest way of shuffling each column independently could be to loop over the columns and use np.random.shuffle to shuffle each one in place:

for i in range(arr.shape[1]):
    np.random.shuffle(arr[:,i])

Which gives, for instance:

array([[12,  1, 14, 11],
       [ 4,  9, 10,  7],
       [ 8,  5,  6, 15],
       [ 0, 13,  2,  3]])

This method can be useful if you have a very large array which you don't want to copy because the permutation of each column is done in place. On the other hand, even simple Python loops can be very slow and there are quicker NumPy methods such as the one provided by @jme.

like image 175
Alex Riley Avatar answered Oct 13 '22 13:10

Alex Riley


Here's another way of doing this:

def permute_columns(x):
    ix_i = np.random.sample(x.shape).argsort(axis=0)
    ix_j = np.tile(np.arange(x.shape[1]), (x.shape[0], 1))
    return x[ix_i, ix_j]

A quick test:

>>> x = np.arange(16).reshape(4,4)
>>> permute_columns(x)
array([[ 8,  9,  2,  3],
       [ 0,  5, 10, 11],
       [ 4, 13, 14,  7],
       [12,  1,  6, 15]])

The idea is to generate a bunch of random numbers, then argsort them within each column independently. This produces a random permutation of each column's indices.

Note that this has sub-optimal asymptotic time complexity, since the sort takes time O(n m log m) for an array of size m x n. But since Python's for loops are pretty slow, you actually get better performance for all but very tall matrices.

like image 21
jme Avatar answered Oct 13 '22 12:10

jme