Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using numpy.take for faster fancy indexing

Tags:

python

numpy

EDIT I have kept the more complicated problem I am facing below, but my problems with np.take can be summarized better as follows. Say you have an array img of shape (planes, rows), and another array lut of shape (planes, 256), and you want to use them to create a new array out of shape (planes, rows), where out[p,j] = lut[p, img[p, j]]. This can be achieved with fancy indexing as follows:

In [4]: %timeit lut[np.arange(planes).reshape(-1, 1), img]
1000 loops, best of 3: 471 us per loop

But if, instead of fancy indexing, you use take and a python loop over the planes things can be sped up tremendously:

In [6]: %timeit for _ in (lut[j].take(img[j]) for j in xrange(planes)) : pass
10000 loops, best of 3: 59 us per loop

Can lut and img be in someway rearranged, so as to have the whole operation happen without python loops, but using numpy.take (or an alternative method) instead of conventional fancy indexing to keep the speed advantage?


ORIGINAL QUESTION I have a set of look-up tables (LUTs) that I want to use on an image. The array holding the LUTs is of shape (planes, 256, n), and the image has shape (planes, rows, cols). Both are of dtype = 'uint8', matching the 256 axis of the LUT. The idea is to run the p-th plane of the image through each of the n LUTs from the p-th plane of the LUT.

If my lut and img are the following:

planes, rows, cols, n = 3, 4000, 4000, 4
lut = np.random.randint(-2**31, 2**31 - 1,
                        size=(planes * 256 * n // 4,)).view('uint8')
lut = lut.reshape(planes, 256, n)
img = np.random.randint(-2**31, 2**31 - 1,
                    size=(planes * rows * cols // 4,)).view('uint8')
img = img.reshape(planes, rows, cols)

I can achieve what I am after using fancy indexing like this

out = lut[np.arange(planes).reshape(-1, 1, 1), img]

which gives me an array of shape (planes, rows, cols, n) , where out[i, :, :, j] holds the i-th plane of img run through the j-th LUT of the i-th plane of the LUT...

All is good, except for this:

In [2]: %timeit lut[np.arange(planes).reshape(-1, 1, 1), img]
1 loops, best of 3: 5.65 s per loop

which is completely unacceptable, especially since I have all of the following not so nice looking alternatives using np.take than run much faster:

  1. A single LUT on a single plane runs about x70 faster:

    In [2]: %timeit np.take(lut[0, :, 0], img[0])
    10 loops, best of 3: 78.5 ms per loop
    
  2. A python loop running through all the desired combinations finishes almost x6 faster:

    In [2]: %timeit for _ in (np.take(lut[j, :, k], img[j]) for j in xrange(planes) for k in xrange(n)) : pass
    1 loops, best of 3: 947 ms per loop
    
  3. Even running all combinations of planes in the LUT and image and then discarding the planes**2 - planes unwanted ones is faster than fancy indexing:

    In [2]: %timeit np.take(lut, img, axis=1)[np.arange(planes), np.arange(planes)]
    1 loops, best of 3: 3.79 s per loop
    
  4. And the fastest combination I have been able to come up with has a python loop iterating over the planes and finishes x13 faster:

    In [2]: %timeit for _ in (np.take(lut[j], img[j], axis=0) for j in xrange(planes)) : pass
    1 loops, best of 3: 434 ms per loop
    

The question of course is if there is no way of doing this with np.take without any python loop? Ideally whatever reshaping or resizing is needed should happen on the LUT, not the image, but I am open to whatever you people can come up with...

like image 306
Jaime Avatar asked Jan 23 '13 23:01

Jaime


People also ask

Is NumPy indexing fast?

Indexing in NumPy is a reasonably fast operation.

What is NumPy fancy indexing?

Fancy indexing is conceptually simple: it means passing an array of indices to access multiple array elements at once. For example, consider the following array: import numpy as np rand = np. random. RandomState(42) x = rand.

Is NumPy indexing slow?

Numpy is fast with operating on vectors (matrices), when performed on the whole structure at once. Such single element-by-element operations are slow.

Does NumPy make Python faster?

NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.


1 Answers

Fist of all I have to say I really liked your question. Without rearranging LUT or IMG the following solution worked:

%timeit a=np.take(lut, img, axis=1)
# 1 loops, best of 3: 1.93s per loop

But from the result you have to query the diagonal: a[0,0], a[1,1], a[2,2]; to get what you want. I've tried to find a way to do this indexing only for the diagonal elements, but still did not manage.

Here are some ways to rearrange your LUT and IMG: The following works if the indexes in IMG are from 0-255, for the 1st plane, 256-511 for the 2nd plane, and 512-767 for the 3rd plane, but that would prevent you from using 'uint8', which can be a big issue...:

lut2 = lut.reshape(-1,4)
%timeit np.take(lut2,img,axis=0)
# 1 loops, best of 3: 716 ms per loop
# or
%timeit np.take(lut2, img.flatten(), axis=0).reshape(3,4000,4000,4)
# 1 loops, best of 3: 709 ms per loop

in my machine your solution is still the best option, and very adequate since you just need the diagonal evaluations, i.e. plane1-plane1, plane2-plane2 and plane3-plane3:

%timeit for _ in (np.take(lut[j], img[j], axis=0) for j in xrange(planes)) : pass
# 1 loops, best of 3: 677 ms per loop

I hope this can give you some insight about a better solution. It would be nice to look for more options with flatten(), and similar methods as np.apply_over_axes() or np.apply_along_axis(), that seem to be promising.

I used this code below to generate the data:

import numpy as np
num = 4000
planes, rows, cols, n = 3, num, num, 4
lut = np.random.randint(-2**31, 2**31-1,size=(planes*256*n//4,)).view('uint8')
lut = lut.reshape(planes, 256, n)
img = np.random.randint(-2**31, 2**31-1,size=(planes*rows*cols//4,)).view('uint8')
img = img.reshape(planes, rows, cols)
like image 107
Saullo G. P. Castro Avatar answered Oct 05 '22 02:10

Saullo G. P. Castro