Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Indexing numpy array by a numpy array of coordinates

Suppose we have

  • an n-dimensional numpy.array A
  • a numpy.array B with dtype=int and shape of (n, m)

How do I index A by B so that the result is an array of shape (m,), with values taken from the positions indicated by the columns of B?

For example, consider this code that does what I want when B is a python list:

>>> a = np.arange(27).reshape(3,3,3)
>>> a[[0, 1, 2], [0, 0, 0], [1, 1, 2]]
array([ 1, 10, 20])    # the result we're after
>>> bl = [[0, 1, 2], [0, 0, 0], [1, 1, 2]]
>>> a[bl]
array([ 1, 10, 20])   # also works when indexing with a python list
>>> a[bl].shape
(3,)

However, when B is a numpy array, the result is different:

>>> b = np.array(bl)
>>> a[b].shape
(3, 3, 3, 3)

Now, I can get the desired result by casting B into a tuple, but surely that cannot be the proper/idiomatic way to do it?

>>> a[tuple(b)]
array([ 1, 10, 20])

Is there a numpy function to achieve the same without casting B to a tuple?

like image 676
Sami Liedes Avatar asked Nov 18 '17 20:11

Sami Liedes


People also ask

Can you index a NumPy array?

Array indexing is the same as accessing an array element. You can access an array element by referring to its index number. The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.

How do you find the index of an element of a NumPy array?

Using ndenumerate() function to find the Index of value It is usually used to find the first occurrence of the element in the given numpy array.

How do I add NumPy array to NumPy array?

You can use numpy. append() function to add an element in a NumPy array. You can pass the NumPy array and multiple values as arguments to the append() function. It doesn't modify the existing array but returns a copy of the passed array with given values added.

Is NumPy indexing fast?

Indexing in NumPy is a reasonably fast operation.


1 Answers

One alternative would be converting to linear indices and then index with np.take or index into its flattened version -

np.take(a,np.ravel_multi_index(b, a.shape))
a.flat[np.ravel_multi_index(b, a.shape)]

Custom np.ravel_multi_index for performance boost

We could implement a custom version to simulate the behaviour of np.ravel_multi_index to boost the performance, like so -

def ravel_index(b, shp):
    return np.concatenate((np.asarray(shp[1:])[::-1].cumprod()[::-1],[1])).dot(b)

Using it, the desired output would be found in two ways -

np.take(a,ravel_index(b, a.shape))
a.flat[ravel_index(b, a.shape)]

Benchmarking

Additionall incorporating tuple based method from the question and map based one from @Kanak's post.

Case #1 : dims = 3

In [23]: a = np.random.randint(0,9,([20]*3))

In [24]: b = np.random.randint(0,20,(a.ndim,1000000))

In [25]: %timeit a[tuple(b)]
    ...: %timeit a[map(np.ravel, b)]  
    ...: %timeit np.take(a,np.ravel_multi_index(b, a.shape))
    ...: %timeit a.flat[np.ravel_multi_index(b, a.shape)]
    ...: %timeit np.take(a,ravel_index(b, a.shape))
    ...: %timeit a.flat[ravel_index(b, a.shape)]
100 loops, best of 3: 6.56 ms per loop
100 loops, best of 3: 6.58 ms per loop
100 loops, best of 3: 6.95 ms per loop
100 loops, best of 3: 9.17 ms per loop
100 loops, best of 3: 6.31 ms per loop
100 loops, best of 3: 8.52 ms per loop

Case #2 : dims = 6

In [29]: a = np.random.randint(0,9,([10]*6))

In [30]: b = np.random.randint(0,10,(a.ndim,1000000))

In [31]: %timeit a[tuple(b)]
    ...: %timeit a[map(np.ravel, b)]  
    ...: %timeit np.take(a,np.ravel_multi_index(b, a.shape))
    ...: %timeit a.flat[np.ravel_multi_index(b, a.shape)]
    ...: %timeit np.take(a,ravel_index(b, a.shape))
    ...: %timeit a.flat[ravel_index(b, a.shape)]
10 loops, best of 3: 40.9 ms per loop
10 loops, best of 3: 40 ms per loop
10 loops, best of 3: 20 ms per loop
10 loops, best of 3: 29.9 ms per loop
100 loops, best of 3: 15.7 ms per loop
10 loops, best of 3: 25.8 ms per loop

Case #3 : dims = 10

In [32]: a = np.random.randint(0,9,([4]*10))

In [33]: b = np.random.randint(0,4,(a.ndim,1000000))

In [34]: %timeit a[tuple(b)]
    ...: %timeit a[map(np.ravel, b)]  
    ...: %timeit np.take(a,np.ravel_multi_index(b, a.shape))
    ...: %timeit a.flat[np.ravel_multi_index(b, a.shape)]
    ...: %timeit np.take(a,ravel_index(b, a.shape))
    ...: %timeit a.flat[ravel_index(b, a.shape)]
10 loops, best of 3: 60.7 ms per loop
10 loops, best of 3: 60.1 ms per loop
10 loops, best of 3: 27.8 ms per loop
10 loops, best of 3: 38 ms per loop
100 loops, best of 3: 18.7 ms per loop
10 loops, best of 3: 29.3 ms per loop

So, it makes sense to look for alternatives when working with higher-dimensional inputs and with large data.

like image 167
Divakar Avatar answered Nov 05 '22 14:11

Divakar