Using numpy.take for faster fancy indexing

Tags:

EDIT I have kept the more complicated problem I am facing below, but my problems with np.take can be summarized better as follows. Say you have an array img of shape (planes, rows), and another array lut of shape (planes, 256), and you want to use them to create a new array out of shape (planes, rows), where out[p,j] = lut[p, img[p, j]]. This can be achieved with fancy indexing as follows:

In [4]: %timeit lut[np.arange(planes).reshape(-1, 1), img]
1000 loops, best of 3: 471 us per loop

But if, instead of fancy indexing, you use take and a python loop over the planes things can be sped up tremendously:

In [6]: %timeit for _ in (lut[j].take(img[j]) for j in xrange(planes)) : pass
10000 loops, best of 3: 59 us per loop

Can lut and img be in someway rearranged, so as to have the whole operation happen without python loops, but using numpy.take (or an alternative method) instead of conventional fancy indexing to keep the speed advantage?

ORIGINAL QUESTION I have a set of look-up tables (LUTs) that I want to use on an image. The array holding the LUTs is of shape (planes, 256, n), and the image has shape (planes, rows, cols). Both are of dtype = 'uint8', matching the 256 axis of the LUT. The idea is to run the p-th plane of the image through each of the n LUTs from the p-th plane of the LUT.

If my lut and img are the following:

planes, rows, cols, n = 3, 4000, 4000, 4
lut = np.random.randint(-2**31, 2**31 - 1,
                        size=(planes * 256 * n // 4,)).view('uint8')
lut = lut.reshape(planes, 256, n)
img = np.random.randint(-2**31, 2**31 - 1,
                    size=(planes * rows * cols // 4,)).view('uint8')
img = img.reshape(planes, rows, cols)

I can achieve what I am after using fancy indexing like this

out = lut[np.arange(planes).reshape(-1, 1, 1), img]

which gives me an array of shape (planes, rows, cols, n) , where out[i, :, :, j] holds the i-th plane of img run through the j-th LUT of the i-th plane of the LUT...

All is good, except for this:

In [2]: %timeit lut[np.arange(planes).reshape(-1, 1, 1), img]
1 loops, best of 3: 5.65 s per loop

which is completely unacceptable, especially since I have all of the following not so nice looking alternatives using np.take than run much faster:

A single LUT on a single plane runs about x70 faster:

In [2]: %timeit np.take(lut[0, :, 0], img[0])
10 loops, best of 3: 78.5 ms per loop

A python loop running through all the desired combinations finishes almost x6 faster:

In [2]: %timeit for _ in (np.take(lut[j, :, k], img[j]) for j in xrange(planes) for k in xrange(n)) : pass
1 loops, best of 3: 947 ms per loop

Even running all combinations of planes in the LUT and image and then discarding the planes**2 - planes unwanted ones is faster than fancy indexing:
```
In [2]: %timeit np.take(lut, img, axis=1)[np.arange(planes), np.arange(planes)]
1 loops, best of 3: 3.79 s per loop
```
And the fastest combination I have been able to come up with has a python loop iterating over the planes and finishes x13 faster:
```
In [2]: %timeit for _ in (np.take(lut[j], img[j], axis=0) for j in xrange(planes)) : pass
1 loops, best of 3: 434 ms per loop
```

The question of course is if there is no way of doing this with np.take without any python loop? Ideally whatever reshaping or resizing is needed should happen on the LUT, not the image, but I am open to whatever you people can come up with...

306

asked Jan 23 '13 23:01

Jaime

1 Answers

Fist of all I have to say I really liked your question. Without rearranging LUT or IMG the following solution worked:

%timeit a=np.take(lut, img, axis=1)
# 1 loops, best of 3: 1.93s per loop

But from the result you have to query the diagonal: a[0,0], a[1,1], a[2,2]; to get what you want. I've tried to find a way to do this indexing only for the diagonal elements, but still did not manage.

Here are some ways to rearrange your LUT and IMG: The following works if the indexes in IMG are from 0-255, for the 1st plane, 256-511 for the 2nd plane, and 512-767 for the 3rd plane, but that would prevent you from using 'uint8', which can be a big issue...:

lut2 = lut.reshape(-1,4)
%timeit np.take(lut2,img,axis=0)
# 1 loops, best of 3: 716 ms per loop
# or
%timeit np.take(lut2, img.flatten(), axis=0).reshape(3,4000,4000,4)
# 1 loops, best of 3: 709 ms per loop

in my machine your solution is still the best option, and very adequate since you just need the diagonal evaluations, i.e. plane1-plane1, plane2-plane2 and plane3-plane3:

%timeit for _ in (np.take(lut[j], img[j], axis=0) for j in xrange(planes)) : pass
# 1 loops, best of 3: 677 ms per loop

I hope this can give you some insight about a better solution. It would be nice to look for more options with flatten(), and similar methods as np.apply_over_axes() or np.apply_along_axis(), that seem to be promising.

I used this code below to generate the data:

import numpy as np
num = 4000
planes, rows, cols, n = 3, num, num, 4
lut = np.random.randint(-2**31, 2**31-1,size=(planes*256*n//4,)).view('uint8')
lut = lut.reshape(planes, 256, n)
img = np.random.randint(-2**31, 2**31-1,size=(planes*rows*cols//4,)).view('uint8')
img = img.reshape(planes, rows, cols)

107

answered Oct 05 '22 02:10

Saullo G. P. Castro

Related questions
                            
                                pymongo.errors.ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused
                            
                                WebDriverException: Message: The command 'GET /session/7.../displayed' was not found while Explicit Wait with safaridriver and Selenium 3.13.0
                            
                                Graph optimizations on a tensorflow serveable created using tf.Estimator
                            
                                Data structure for arrays which share some elements -- Python
                            
                                how is total loss calculated over multiple classes in Keras?
                            
                                Solving PDE with implicit euler in python - incorrect output
                            
                                Putting .SVG images into tkinter Frame
                            
                                How to export a plotly dashboard app into a html standalone file to share with the others?
                            
                                django-paypal setup
                            
                                How to correlate two time series with gaps and different time bases?
                            
                                GUI development with IronPython and Visual Studio 2010
                            
                                How to create a custom Python exception type in C extension?
                            
                                What can change my floating point control word behind my back?
                            
                                Using the python multiprocessing module for IO with pygame on Mac OS 10.7
                            
                                Logic game: maximising (or minimising) the chances for two agents to meet
                            
                                UnknownTimezoneError Exception Raised with Python Application Compiled with Py2Exe
                            
                                Django. Thread safe update or create.
                            
                                scikits learn and nltk: Naive Bayes classifier performance highly different
                            
                                Python: curses key codes to readable (vim-like?) syntax
                            
                                Does it make sense to modify in-place AND return a copy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using numpy.take for faster fancy indexing

Tags:

python

numpy

Jaime

People also ask

1 Answers

Saullo G. P. Castro

Recent Activity

Donate For Us