I want the resulting array as a binary yes/no.
I came up with
img = PIL.Image.open(filename)
array = numpy.array(img)
thresholded_array = numpy.copy(array)
brightest = numpy.amax(array)
threshold = brightest/2
for b in xrange(490):
for c in xrange(490):
if array[b][c] > threshold:
thresholded_array[b][c] = 255
else:
thresholded_array[b][c] = 0
out=PIL.Image.fromarray(thresholded_array)
but iterating over the array one value at a time is very very slow and I know there must be a faster way, what's the fastest?
By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.
It's faster to append list first and convert to array than appending NumPy arrays. NumPy automatically converts lists, usually, so I removed the unneeded array() conversions.
Furthermore, if the index array has the same shape as the original array, the elements corresponding to True will be selected and put in the resulting array. Indexing in NumPy is a reasonably fast operation. Anyway, when speed is critical, you can use the, slightly faster, numpy.
Instead of looping, you can compare the entire array at once in several ways. Starting from
>>> arr = np.random.randint(0, 255, (3,3))
>>> brightest = arr.max()
>>> threshold = brightest // 2
>>> arr
array([[214, 151, 216],
[206, 10, 162],
[176, 99, 229]])
>>> brightest
229
>>> threshold
114
Method #1: use np.where
:
>>> np.where(arr > threshold, 255, 0)
array([[255, 255, 255],
[255, 0, 255],
[255, 0, 255]])
Method #2: use boolean indexing to create a new array
>>> up = arr > threshold
>>> new_arr = np.zeros_like(arr)
>>> new_arr[up] = 255
Method #3: do the same, but use an arithmetic hack
>>> (arr > threshold) * 255
array([[255, 255, 255],
[255, 0, 255],
[255, 0, 255]])
which works because False == 0
and True == 1
.
For a 1000x1000 array, it looks like the arithmetic hack is fastest for me, but to be honest I'd use np.where
because I think it's clearest:
>>> %timeit np.where(arr > threshold, 255, 0)
100 loops, best of 3: 12.3 ms per loop
>>> %timeit up = arr > threshold; new_arr = np.zeros_like(arr); new_arr[up] = 255;
100 loops, best of 3: 14.2 ms per loop
>>> %timeit (arr > threshold) * 255
100 loops, best of 3: 6.05 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With