Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy: Sorting a multidimensional array by a multidimensional array

Forgive me if this is redundant or super basic. I'm coming to Python/Numpy from R and having a hard time flipping things around in my head.

I have a n dimensional array which I want to sort using another n dimensional array of index values. I know I could wrap this in a loop but it seems like there should be a really concise Numpyonic way of beating this into submission. Here's my example code to set up the problem where n=2:

a1 = random.standard_normal(size=[2,5]) 
index = array([[0,1,2,4,3] , [0,1,2,3,4] ]) 

so now I have a 2 x 5 array of random numbers and a 2 x 5 index. I've read the help for take() about 10 times now but my brain is not groking it, obviously.

I thought this might get me there:

take(a1, index)

array([[ 0.29589188, -0.71279375, -0.18154864, -1.12184984,  0.25698875],
       [ 0.29589188, -0.71279375, -0.18154864,  0.25698875, -1.12184984]])

but that's clearly reordering only the first element (I presume because of flattening).

Any tips on how I get from where I am to a solution that sorts element 0 of a1 by element 0 of the index ... element n?

like image 691
JD Long Avatar asked Jun 06 '12 20:06

JD Long


People also ask

How do you sort a multidimensional NumPy array in Python?

NumPy arrays can be sorted by a single column, row, or by multiple columns or rows using the argsort() function. The argsort function returns a list of indices that will sort the values in an array in ascending value.

How do you sort a 2D NumPy array in descending order?

Sort the rows of a 2D array in descending order The code axis = 1 indicates that we'll be sorting the data in the axis-1 direction, and by using the negative sign in front of the array name and the function name, the code will sort the rows in descending order.

How do I sort a 2D array by a column?

To column-wise sort a 2D Array in Java, call the “Arrays. sort()” method with a “Comparator interface”. A Comparator interface defines a “compare()” method that accepts two parameters and then compares them with each other. If the passed parameters are equal, it returns zero.


2 Answers

After playing with this some more today I figured out that if I used a mapper function along with take I could solve the 2 dimensional version really simply like this:

a1 = random.standard_normal(size=[2,5]) 
index = array([[0,1,2,4,3] , [0,1,2,3,4] ]) 
map(take, a1, index)

I needed to map() the take() to each element in a1

Of course, the accepted answer solves the n-dimensional version. However in retrospect I determined that I don't really need the n-dimensional solution, only the 2-D version.

like image 45
JD Long Avatar answered Oct 14 '22 23:10

JD Long


I can't think of how to work this in N dimensions yet, but here is the 2D version:

>>> a = np.random.standard_normal(size=(2,5))
>>> a
array([[ 0.72322499, -0.05376714, -0.28316358,  1.43025844, -0.90814293],
       [ 0.7459107 ,  0.43020728,  0.05411805, -0.32813465,  2.38829386]])
>>> i = np.array([[0,1,2,4,3],[0,1,2,3,4]]) 
>>> a[np.arange(a.shape[0])[:,np.newaxis],i]
array([[ 0.72322499, -0.05376714, -0.28316358, -0.90814293,  1.43025844],
       [ 0.7459107 ,  0.43020728,  0.05411805, -0.32813465,  2.38829386]])

Here is the N-dimensional version:

>>> a[list(np.ogrid[[slice(x) for x in a.shape]][:-1])+[i]]

Here's how it works:

Ok, let's start with a 3 dimensional array for illustration.

>>> import numpy as np
>>> a = np.arange(24).reshape((2,3,4))
>>> a
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

You can access elements of this array by specifying the index along each axis as follows:

>>> a[0,1,2]
6

This is equivalent to a[0][1][2] which is how you would access the same element if we were dealing with a list instead of an array.

Numpy allows you to get even fancier when slicing arrays:

>>> a[[0,1],[1,1],[2,2]]
array([ 6, 18])
>>> a[[0,1],[1,2],[2,2]]
array([ 6, 22])

These examples would be equivalent to [a[0][1][2],a[1][1][2]] and [a[0][1][2],a[1][2][2]] if we were dealing with lists.

You can even leave out repeated indices and numpy will figure out what you want. For example, the above examples could be equivalently written:

>>> a[[0,1],1,2]
array([ 6, 18])
>>> a[[0,1],[1,2],2]
array([ 6, 22])

The shape of the array (or list) you slice with in each dimension only affects the shape of the returned array. In other words, numpy doesn't care that you are trying to index your array with an array of shape (2,3,4) when it's pulling values, except that it will feed you back an array of shape (2,3,4). For example:

>>> a[[[0,0],[0,0]],[[0,0],[0,0]],[[0,0],[0,0]]]
array([[0, 0],
       [0, 0]])

In this case, we're grabbing the same element, a[0,0,0] over and over again, but numpy is returning an array with the same shape as we passed in.

Ok, onto your problem. What you want is to index the array along the last axis with the numbers in your index array. So, for the example in your question you would like [[a[0,0],a[0,1],a[0,2],a[0,4],a[0,3]],a[1,0],a[1,1],...

The fact that your index array is multidimensional, like I said earlier, doesn't tell numpy anything about where you want to pull these indices from; it just specifies the shape of the output array. So, in your example, you need to tell numpy that the first 5 values are to be pulled from a[0] and the latter 5 from a[1]. Easy!

>>> a[[[0]*5,[1]*5],index]

It gets complicated in N dimensions, but let's do it for the 3 dimensional array a I defined way above. Suppose we have the following index array:

>>> i = np.array(range(4)[::-1]*6).reshape(a.shape)
>>> i
array([[[3, 2, 1, 0],
        [3, 2, 1, 0],
        [3, 2, 1, 0]],

       [[3, 2, 1, 0],
        [3, 2, 1, 0],
        [3, 2, 1, 0]]])

So, these values are all for indices along the last axis. We need to tell numpy what indices along the first and second axes these numbers are to be taken from; i.e. we need to tell numpy that the indices for the first axis are:

i1 = [[[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]],

      [[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]]]

and the indices for the second axis are:

i2 = [[[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2]],

      [[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2]]]

Then we can just do:

>>> a[i1,i2,i]
array([[[ 3,  2,  1,  0],
        [ 7,  6,  5,  4],
        [11, 10,  9,  8]],

       [[15, 14, 13, 12],
        [19, 18, 17, 16],
        [23, 22, 21, 20]]])

The handy numpy function which generates i1 and i2 is called np.mgrid. I use np.ogrid in my answer which is equivalent in this case because of the numpy magic I talked about earlier.

Hope that helps!

like image 66
user545424 Avatar answered Oct 14 '22 22:10

user545424