I've got a numpy array containing labels. I'd like to get calculate a number for each label based on its size and bounding box. How can I write this more efficiently so that it's realistic to use on large arrays (~15000 labels)? <pre class="prettyprint"><code>A = array([[ 1, 1, 0, 3, 3], [ 1, 1, 0, 0, 0], [ 1, 0, 0, 2, 2], [ 1, 0, 2, 2, 2]] ) B = zeros( 4 ) for label in range(1, 4): # get the bounding box of the label label_points = argwhere( A == label ) (y0, x0), (y1, x1) = label_points.min(0), label_points.max(0) + 1 # assume I've computed the size of each label in a numpy array size_A B[ label ] = myfunc(y0, x0, y1, x1, size_A[label]) </code></pre>

I wasn't really able to implement this efficiently using some NumPy vectorised functions, so maybe a clever Python implementation will be faster. <pre class="prettyprint"><code>def first_row(a, labels): d = {} d_setdefault = d.setdefault len_ = len num_labels = len_(labels) for i, row in enumerate(a): for label in row: d_setdefault(label, i) if len_(d) == num_labels: break return d </code></pre> This function returns a dictionary mapping each label to the index of the first row it appears in. Applying the function to <code>A</code>, <code>A.T</code>, <code>A[::-1]</code> and <code>A.T[::-1]</code> also gives you the first column as well as the last row and column. If you would rather like a list instead of a dictionary, you can turn the dictionary into a list using <code>map(d.get, labels)</code>. Alternatively, you can use a NumPy array instead of a dictionary right from the start, but you will lose the ability to leave the loop early as soon as all labels were found. I'd be interested whether (and how much) this actually speeds up your code, but I'm confident that it is faster than your original solution.

How can I improve the efficiency of this numpy loop

Tags:

python

optimization

numpy

I've got a numpy array containing labels. I'd like to get calculate a number for each label based on its size and bounding box. How can I write this more efficiently so that it's realistic to use on large arrays (~15000 labels)?

A = array([[ 1, 1, 0, 3, 3],
           [ 1, 1, 0, 0, 0],
           [ 1, 0, 0, 2, 2],
           [ 1, 0, 2, 2, 2]] )

B = zeros( 4 )

for label in range(1, 4):
    # get the bounding box of the label
    label_points = argwhere( A == label )
    (y0, x0), (y1, x1) = label_points.min(0), label_points.max(0) + 1

    # assume I've computed the size of each label in a numpy array size_A
    B[ label ] = myfunc(y0, x0, y1, x1, size_A[label])

490

asked Nov 23 '11 16:11

ajwood

1 Answers

I wasn't really able to implement this efficiently using some NumPy vectorised functions, so maybe a clever Python implementation will be faster.

def first_row(a, labels):
    d = {}
    d_setdefault = d.setdefault
    len_ = len
    num_labels = len_(labels)
    for i, row in enumerate(a):
        for label in row:
            d_setdefault(label, i)
        if len_(d) == num_labels:
            break
    return d

This function returns a dictionary mapping each label to the index of the first row it appears in. Applying the function to A, A.T, A[::-1] and A.T[::-1] also gives you the first column as well as the last row and column.

If you would rather like a list instead of a dictionary, you can turn the dictionary into a list using map(d.get, labels). Alternatively, you can use a NumPy array instead of a dictionary right from the start, but you will lose the ability to leave the loop early as soon as all labels were found.

I'd be interested whether (and how much) this actually speeds up your code, but I'm confident that it is faster than your original solution.

195

answered Sep 27 '22 22:09

Sven Marnach

Related questions
                            
                                What is C#'s version of the GIL?
                            
                                Should I wait for Django to start supporting Python 3?
                            
                                Reading an entire binary file into Python
                            
                                Watching a property for changes
                            
                                How to run Ruby/Python scripts from inside PHP passing and receiving parameters?
                            
                                Get the list of a class's variables & methods in Python
                            
                                python join equivalent
                            
                                Pythonic way to merge two List of tuples into single list of dict
                            
                                Is there a pure Python library for parsing a Windows Registry file?
                            
                                How to log exceptions in appengine?
                            
                                Python: detect duplicates using a set
                            
                                Webpage redirect to the main page with CGI Python
                            
                                elegant way of using a range using an if statement?
                            
                                Python TimedRotatingFileHandler logs to a file and stderr
                            
                                Python zeromq -- Multiple Publishers To a Single Subscriber?
                            
                                SQLite3 and Multiprocessing
                            
                                Django - Template display model verbose_names & objects
                            
                                Using Mock() in Python
                            
                                Calculate number of days between two dates inside Django templates
                            
                                How do I inspect a Python's class hierarchy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With