Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I improve the efficiency of this numpy loop

I've got a numpy array containing labels. I'd like to get calculate a number for each label based on its size and bounding box. How can I write this more efficiently so that it's realistic to use on large arrays (~15000 labels)?

A = array([[ 1, 1, 0, 3, 3],
           [ 1, 1, 0, 0, 0],
           [ 1, 0, 0, 2, 2],
           [ 1, 0, 2, 2, 2]] )

B = zeros( 4 )

for label in range(1, 4):
    # get the bounding box of the label
    label_points = argwhere( A == label )
    (y0, x0), (y1, x1) = label_points.min(0), label_points.max(0) + 1

    # assume I've computed the size of each label in a numpy array size_A
    B[ label ] = myfunc(y0, x0, y1, x1, size_A[label])
like image 490
ajwood Avatar asked Nov 23 '11 16:11

ajwood


People also ask

How can I make NumPy faster?

By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.

How efficient is NumPy?

Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster. So overall a task executed in Numpy is around 5 to 100 times faster than the standard python list, which is a significant leap in terms of speed.

How does NumPy optimize?

NumPy allows arrays to only have a single data type and stores the data internally in a contiguous block of memory. Taking advantage of this fact, NumPy delegates most of the operations on such arrays to optimized, pre-compiled C code under the hood.

Is appending to NumPy array efficient?

Appending to numpy arrays is very inefficient. This is because the interpreter needs to find and assign memory for the entire array at every single step. Depending on the application, there are much better strategies. If you know the length in advance, it is best to pre-allocate the array using a function like np.


1 Answers

I wasn't really able to implement this efficiently using some NumPy vectorised functions, so maybe a clever Python implementation will be faster.

def first_row(a, labels):
    d = {}
    d_setdefault = d.setdefault
    len_ = len
    num_labels = len_(labels)
    for i, row in enumerate(a):
        for label in row:
            d_setdefault(label, i)
        if len_(d) == num_labels:
            break
    return d

This function returns a dictionary mapping each label to the index of the first row it appears in. Applying the function to A, A.T, A[::-1] and A.T[::-1] also gives you the first column as well as the last row and column.

If you would rather like a list instead of a dictionary, you can turn the dictionary into a list using map(d.get, labels). Alternatively, you can use a NumPy array instead of a dictionary right from the start, but you will lose the ability to leave the loop early as soon as all labels were found.

I'd be interested whether (and how much) this actually speeds up your code, but I'm confident that it is faster than your original solution.

like image 195
Sven Marnach Avatar answered Sep 27 '22 22:09

Sven Marnach