Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the fastest way to threshold a numpy array?

I want the resulting array as a binary yes/no.

I came up with

    img = PIL.Image.open(filename)

    array = numpy.array(img)
    thresholded_array = numpy.copy(array)

    brightest = numpy.amax(array)
    threshold = brightest/2

    for b in xrange(490):
        for c in xrange(490):
            if array[b][c] > threshold:
                thresholded_array[b][c] = 255
            else:
                thresholded_array[b][c] = 0

    out=PIL.Image.fromarray(thresholded_array)

but iterating over the array one value at a time is very very slow and I know there must be a faster way, what's the fastest?

like image 450
El Confuso Avatar asked Jun 26 '15 04:06

El Confuso


People also ask

How can I make my NumPy code faster?

By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.

Is appending to NumPy array faster than list?

It's faster to append list first and convert to array than appending NumPy arrays. NumPy automatically converts lists, usually, so I removed the unneeded array() conversions.

Is NumPy indexing fast?

Furthermore, if the index array has the same shape as the original array, the elements corresponding to True will be selected and put in the resulting array. Indexing in NumPy is a reasonably fast operation. Anyway, when speed is critical, you can use the, slightly faster, numpy.


1 Answers

Instead of looping, you can compare the entire array at once in several ways. Starting from

>>> arr = np.random.randint(0, 255, (3,3))
>>> brightest = arr.max()
>>> threshold = brightest // 2
>>> arr
array([[214, 151, 216],
       [206,  10, 162],
       [176,  99, 229]])
>>> brightest
229
>>> threshold
114

Method #1: use np.where:

>>> np.where(arr > threshold, 255, 0)
array([[255, 255, 255],
       [255,   0, 255],
       [255,   0, 255]])

Method #2: use boolean indexing to create a new array

>>> up = arr > threshold
>>> new_arr = np.zeros_like(arr)
>>> new_arr[up] = 255

Method #3: do the same, but use an arithmetic hack

>>> (arr > threshold) * 255
array([[255, 255, 255],
       [255,   0, 255],
       [255,   0, 255]])

which works because False == 0 and True == 1.


For a 1000x1000 array, it looks like the arithmetic hack is fastest for me, but to be honest I'd use np.where because I think it's clearest:

>>> %timeit np.where(arr > threshold, 255, 0)
100 loops, best of 3: 12.3 ms per loop
>>> %timeit up = arr > threshold; new_arr = np.zeros_like(arr); new_arr[up] = 255;
100 loops, best of 3: 14.2 ms per loop
>>> %timeit (arr > threshold) * 255
100 loops, best of 3: 6.05 ms per loop
like image 198
DSM Avatar answered Oct 06 '22 05:10

DSM