Forgive me if this is redundant or super basic. I'm coming to Python/Numpy from R and having a hard time flipping things around in my head. I have a n dimensional array which I want to sort using another n dimensional array of index values. I know I could wrap this in a loop but it seems like there should be a really concise Numpyonic way of beating this into submission. Here's my example code to set up the problem where n=2: <pre class="prettyprint"><code>a1 = random.standard_normal(size=[2,5]) index = array([[0,1,2,4,3] , [0,1,2,3,4] ]) </code></pre> so now I have a 2 x 5 array of random numbers and a 2 x 5 index. I've read the help for <code>take()</code> about 10 times now but my brain is not groking it, obviously. I thought this might get me there: <pre class="prettyprint"><code>take(a1, index) array([[ 0.29589188, -0.71279375, -0.18154864, -1.12184984, 0.25698875], [ 0.29589188, -0.71279375, -0.18154864, 0.25698875, -1.12184984]]) </code></pre> but that's clearly reordering only the first element (I presume because of flattening). Any tips on how I get from where I am to a solution that sorts element 0 of a1 by element 0 of the index ... element n?

After playing with this some more today I figured out that if I used a mapper function along with take I could solve the 2 dimensional version really simply like this: <pre class="prettyprint"><code>a1 = random.standard_normal(size=[2,5]) index = array([[0,1,2,4,3] , [0,1,2,3,4] ]) map(take, a1, index) </code></pre> I needed to <code>map()</code> the <code>take()</code> to each element in <code>a1</code> Of course, the accepted answer solves the n-dimensional version. However in retrospect I determined that I don't really need the n-dimensional solution, only the 2-D version.

<strike>I can't think of how to work this in N dimensions yet, but</strike> here is the 2D version: <pre class="prettyprint"><code>>>> a = np.random.standard_normal(size=(2,5)) >>> a array([[ 0.72322499, -0.05376714, -0.28316358, 1.43025844, -0.90814293], [ 0.7459107 , 0.43020728, 0.05411805, -0.32813465, 2.38829386]]) >>> i = np.array([[0,1,2,4,3],[0,1,2,3,4]]) >>> a[np.arange(a.shape[0])[:,np.newaxis],i] array([[ 0.72322499, -0.05376714, -0.28316358, -0.90814293, 1.43025844], [ 0.7459107 , 0.43020728, 0.05411805, -0.32813465, 2.38829386]]) </code></pre> Here is the N-dimensional version: <pre class="prettyprint"><code>>>> a[list(np.ogrid[[slice(x) for x in a.shape]][:-1])+[i]] </code></pre> Here's how it works: Ok, let's start with a 3 dimensional array for illustration. <pre class="prettyprint"><code>>>> import numpy as np >>> a = np.arange(24).reshape((2,3,4)) >>> a array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]) </code></pre> You can access elements of this array by specifying the index along each axis as follows: <pre class="prettyprint"><code>>>> a[0,1,2] 6 </code></pre> This is equivalent to <code>a[0][1][2]</code> which is how you would access the same element if we were dealing with a list instead of an array. Numpy allows you to get even fancier when slicing arrays: <pre class="prettyprint"><code>>>> a[[0,1],[1,1],[2,2]] array([ 6, 18]) >>> a[[0,1],[1,2],[2,2]] array([ 6, 22]) </code></pre> These examples would be equivalent to <code>[a[0][1][2],a[1][1][2]]</code> and <code>[a[0][1][2],a[1][2][2]]</code> if we were dealing with lists. You can even leave out repeated indices and numpy will figure out what you want. For example, the above examples could be equivalently written: <pre class="prettyprint"><code>>>> a[[0,1],1,2] array([ 6, 18]) >>> a[[0,1],[1,2],2] array([ 6, 22]) </code></pre> The shape of the array (or list) you slice with in each dimension only affects the shape of the returned array. In other words, numpy doesn't care that you are trying to index your array with an array of shape <code>(2,3,4)</code> when it's pulling values, except that it will feed you back an array of shape <code>(2,3,4)</code>. For example: <pre class="prettyprint"><code>>>> a[[[0,0],[0,0]],[[0,0],[0,0]],[[0,0],[0,0]]] array([[0, 0], [0, 0]]) </code></pre> In this case, we're grabbing the same element, <code>a[0,0,0]</code> over and over again, but numpy is returning an array with the same shape as we passed in. Ok, onto your problem. What you want is to index the array along the last axis with the numbers in your <code>index</code> array. So, for the example in your question you would like <code>[[a[0,0],a[0,1],a[0,2],a[0,4],a[0,3]],a[1,0],a[1,1],...</code> The fact that your index array is multidimensional, like I said earlier, doesn't tell numpy anything about where you want to pull these indices from; it just specifies the shape of the output array. So, in your example, you need to tell numpy that the first 5 values are to be pulled from <code>a[0]</code> and the latter 5 from <code>a[1]</code>. Easy! <pre class="prettyprint"><code>>>> a[[[0]*5,[1]*5],index] </code></pre> It gets complicated in N dimensions, but let's do it for the 3 dimensional array <code>a</code> I defined way above. Suppose we have the following index array: <pre class="prettyprint"><code>>>> i = np.array(range(4)[::-1]*6).reshape(a.shape) >>> i array([[[3, 2, 1, 0], [3, 2, 1, 0], [3, 2, 1, 0]], [[3, 2, 1, 0], [3, 2, 1, 0], [3, 2, 1, 0]]]) </code></pre> So, these values are all for indices along the last axis. We need to tell numpy what indices along the first and second axes these numbers are to be taken from; i.e. we need to tell numpy that the indices for the first axis are: <pre class="prettyprint"><code>i1 = [[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]], [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]] </code></pre> and the indices for the second axis are: <pre class="prettyprint"><code>i2 = [[[0, 0, 0, 0], [1, 1, 1, 1], [2, 2, 2, 2]], [[0, 0, 0, 0], [1, 1, 1, 1], [2, 2, 2, 2]]] </code></pre> Then we can just do: <pre class="prettyprint"><code>>>> a[i1,i2,i] array([[[ 3, 2, 1, 0], [ 7, 6, 5, 4], [11, 10, 9, 8]], [[15, 14, 13, 12], [19, 18, 17, 16], [23, 22, 21, 20]]]) </code></pre> The handy numpy function which generates <code>i1</code> and <code>i2</code> is called <code>np.mgrid</code>. I use <code>np.ogrid</code> in my answer which is equivalent in this case because of the numpy magic I talked about earlier. Hope that helps!

Numpy: Sorting a multidimensional array by a multidimensional array

Tags:

python

slice

numpy

Forgive me if this is redundant or super basic. I'm coming to Python/Numpy from R and having a hard time flipping things around in my head.

I have a n dimensional array which I want to sort using another n dimensional array of index values. I know I could wrap this in a loop but it seems like there should be a really concise Numpyonic way of beating this into submission. Here's my example code to set up the problem where n=2:

Numpy: Sorting a multidimensional array by a multidimensional array

Tags:

python

slice

numpy

JD Long

People also ask

2 Answers

JD Long

user545424

Related questions

Recent Activity

Donate For Us