Speeding-up "for-loop" in image analysis when iterations are up to 40,000

Question

The details of the prerequisites of this code are quite long so I'll try my best to summarize. WB/RG/BYColor is the base image, FIDO is an overlay of this base image which is applied to it. S_wb/rg/by are the final output images. WB/RG/BYColor are the same size as FIDO.

For each unique element in FIDO, we want to calculate the average color of that region within the base images. The below code does this, but as numFIDOs is very large (up to 40,000), this takes a long time.

The averages are computed for the three separate RGB channels.

sX=200
sY=200
S_wb = np.zeros((sX, sY))
S_rg = np.zeros((sX, sY))
S_by = np.zeros((sX, sY))
uniqueFIDOs, unique_counts = np.unique(FIDO, return_counts=True) 
numFIDOs = uniqueFIDOs.shape  
for i in np.arange(0,numFIDOs[0]):
    Lookup = FIDO==uniqueFIDOs[i]
    # Get average of color signals for this FIDO
    S_wb[Lookup] = np.sum(WBColor[Lookup])/unique_counts[i]
    S_rg[Lookup] = np.sum(RGColor[Lookup])/unique_counts[i]
    S_by[Lookup] = np.sum(BYColor[Lookup])/unique_counts[i]

This takes about 7.89 seconds to run, no so long, but this will be included in another loop, so it builds up!

I have tried vectorization (shown below) but I couldn't do it

FIDOsize = unique_counts[0:numFIDOs[0]:1]
Lookup = FIDO ==uniqueFIDOs[0:numFIDOs[0]:1]
S_wb[Lookup] = np.sum(WBColor[Lookup])/FIDOsize
S_rg[Lookup] = np.sum(RGColor[Lookup])/FIDOsize
S_by[Lookup] = np.sum(BYColor[Lookup])/FIDOsize

error in array size matching

chthonicdaemon · Accepted Answer

By my timing, this is about 10 times faster than your original method. I tested with these arrays:

import numpy as np

sX=200
sY=200

FIDO = np.random.randint(0, sX*sY, (sX, sY))
WBColor = np.random.randint(0, sX*sY, (sX, sY))
RGColor = np.random.randint(0, sX*sY, (sX, sY))
BYColor = np.random.randint(0, sX*sY, (sX, sY))

This is the part I timed:

import collections

colors = {'wb': WBColor, 'rg': RGColor, 'by': BYColor}
planes = colors.keys()
S = {plane: np.zeros((sX, sY)) for plane in planes}

for plane in planes:
    counts = collections.defaultdict(int)
    sums = collections.defaultdict(int)
    for (i, j), f in np.ndenumerate(FIDO):
        counts[f] += 1
        sums[f] += colors[plane][i, j]
    for (i, j), f in np.ndenumerate(FIDO):
        S[plane][i, j] = sums[f]/counts[f]

Probably because even though looping in Python is slow, this traverses the data less.

Note, the original version gets faster if there are a small number of unique values in FIDO. This takes roughly the same time for most cases.

Imanol Luengo · Answer

As @lejlot suggested before, the code is quite hard to vectorize. It cannot be run in parallel, unless you know which pixels belong to each FIDO in advance. I don't know if you call FIDO to superpixels, but I do usually work with this kind of problems, and the best solution I've found so far is as follows:

Flatten the data:
```
data = data.reshape(-1, 3)
labels = FIDO.copy()
```
Here data is your (Width, Height, 3) image, rather than the separate 3 vectors that you have. It gets flattened to (Width * Height, 3).
Relabel FIDO to 0..N-1 range, where N=num unique FIDO:
```
from skimage.segmentation import relabel_sequential

labels = relabel_sequential(labels)[0]
labels -= labels.min()
```
The above, from scikit-image, transforms your FIDO array to the [0, N-1] range, which is much easier to work with later.

Last, code in cython a simple function to calculate the mean for each of the FIDO;s (as they are ordered from 0 to N, you can do it in a 1D array with length N):

def fmeans(double[:, ::1] data, long[::1] labels, long nsp):
    cdef long n,  N = labels.shape[0]
    cdef int K = data.shape[1]
    cdef double[:, ::1] F = np.zeros((nsp, K), np.float64)
    cdef int[::1] sizes = np.zeros(nsp, np.int32)
    cdef long l, b
    cdef double t

    for n in range(N):
        l = labels[n]
        sizes[l] += 1

        for z in range(K):
            t = data[n, z]
            F[l, z] += t

    for n in range(nsp):
        for z in range(K):
            F[n, z] /= sizes[n]

return np.asarray(F)

You can call that function later (once compiled with cython), as simple as:

mean_colors = fmeans(data, labels.flatten(), labels.max()+1) # labels.max()+1 == N

The image of mean colors can then be recovered as:

mean_img = mean_colors[labels]

If you do not want to code in cython, scikit-image also provides bindings for this by using a graph structure and networkx, however is much slower:

http://scikit-image.org/docs/dev/auto_examples/plot_rag_mean_color.html

The above example contains the function calls you need to get an image with the average color of each superpixel as labels1 (your FIDO).

NOTE: The cython approach is much faster, as instead of iterating the number of unique FIDO N and for each of them scan the image (size M = Width x Height) this only iterates the image ONCE. Thus, computational cost is in the order of O(M+N) rather than O(M*N) of your original approach.

Example test:

import numpy as np
from skimage.segmentation import relabel_sequential

sX=200
sY=200

FIDO = np.random.randint(0, sX*sY, (sX, sY))
data = np.random.rand(sX, sY, 3) # Your image

Flatten and relabel:

data = data.reshape(-1, 3)
labels = relabel_sequential(FIDO)[0]
labels -= labels.min()

Obtain mean:

>>> %timeit color_means = fmeans(data, labels.flatten(), labels.max()+1)
1000 loops, best of 3: 520 µs per loop

It takes 0.5ms (half a millisecond) for a 200x200 image for:

print labels.max()+1 # --> 25787 unique FIDO
print color_means.shape # --> (25287, 3), the mean color of each FIDO

You can restore the image of mean colors using smart indexing:

mean_image = color_means[labels]
print mean_image.shape # --> (200, 200, 3)

I doubt you can get that speed with raw python approaches (or at least, I didn't find how).

Speeding-up "for-loop" in image analysis when iterations are up to 40,000

Tags:

performance

python

equality

for-loop

numpy

William Baker Morrison

2 Answers

chthonicdaemon

Imanol Luengo

Recent Activity

Donate For Us

Speeding-up "for-loop" in image analysis when iterations are up to 40,000

Tags:

performance

python

equality

for-loop

numpy

William Baker Morrison

2 Answers

chthonicdaemon

Imanol Luengo

Related questions

Recent Activity

Donate For Us