How to efficiently find the bounding box of a collection of points?

Tags:

I have several points stored in an array. I need to find bounds of that points ie. the rectangle which bounds all the points. I know how to solve this in plain Python.

I would like to know is there a better way than the naive max, min over the array or built-in method to solve the problem.

points = [[1, 3], [2, 4], [4, 1], [3, 3], [1, 6]]
b = bounds(points) # the function I am looking for
# now b = [[1, 1], [4, 6]]

389

asked Sep 21 '17 04:09

2 Answers

My approach to getting performance is to push things down to C level whenever possible:

def bounding_box(points):
    x_coordinates, y_coordinates = zip(*points)

    return [(min(x_coordinates), min(y_coordinates)), (max(x_coordinates), max(y_coordinates))]

By my (crude) measure, this runs about 1.5 times faster than @ReblochonMasque's bounding_box_naive(). And is clearly more elegant. ;-)

answered Sep 29 '22 16:09

cdlane

You cannot do better than O(n), because you must traverse all the points to determine the max and min for x and y.

But, you can reduce the constant factor, and traverse the list only once; however, it is unclear if that would give you a better execution time, and if it does, it would be for large collections of points.

[EDIT]: in fact it does not, the "naive" approach is the most efficient.

Here is the "naive" approach: (it is the fastest of the two)

def bounding_box_naive(points):
    """returns a list containing the bottom left and the top right 
    points in the sequence
    Here, we use min and max four times over the collection of points
    """
    bot_left_x = min(point[0] for point in points)
    bot_left_y = min(point[1] for point in points)
    top_right_x = max(point[0] for point in points)
    top_right_y = max(point[1] for point in points)

    return [(bot_left_x, bot_left_y), (top_right_x, top_right_y)]

and the (maybe?) less naive:

def bounding_box(points):
    """returns a list containing the bottom left and the top right 
    points in the sequence
    Here, we traverse the collection of points only once, 
    to find the min and max for x and y
    """
    bot_left_x, bot_left_y = float('inf'), float('inf')
    top_right_x, top_right_y = float('-inf'), float('-inf')
    for x, y in points:
        bot_left_x = min(bot_left_x, x)
        bot_left_y = min(bot_left_y, y)
        top_right_x = max(top_right_x, x)
        top_right_y = max(top_right_y, y)

    return [(bot_left_x, bot_left_y), (top_right_x, top_right_y)]

profiling results:

import random
points = [(random.randrange(-1000, 1000), random.randrange(-1000, 1000))  for _ in range(1000000)]

%timeit bounding_box_naive(points)
%timeit bounding_box(points)

size = 1,000 points

1000 loops, best of 3: 573 µs per loop
1000 loops, best of 3: 1.46 ms per loop

size = 10,000 points

100 loops, best of 3: 5.7 ms per loop
100 loops, best of 3: 14.7 ms per loop

size 100,000 points

10 loops, best of 3: 66.8 ms per loop
10 loops, best of 3: 141 ms per loop

size 1,000,000 points

1 loop, best of 3: 664 ms per loop
1 loop, best of 3: 1.47 s per loop

Clearly, the first "not so naive" approach is faster by a factor 2.5 - 3

answered Sep 29 '22 18:09

Reblochon Masque

Related questions
                            
                                how to get all mysql tuple result and convert to json
                            
                                virtualenv can't find python2
                            
                                fast XORing bytes in python 3 [duplicate]
                            
                                How do you assert something is not true in Python?
                            
                                Custom user in django raises ValueError
                            
                                No module named flask while running uWSGI
                            
                                The number of calendar weeks in a year?
                            
                                Theano Shared Variables on Python
                            
                                Python : Replacing Values in netcdf file using netCDF4
                            
                                difference between ways to generate index list in python
                            
                                Python Numpy generate coordinates for X and Y values in a certain range
                            
                                Trouble connecting to phantomJs webdriver using python and selenium
                            
                                Parametrize class tests with pytest
                            
                                regex to match any character or none?
                            
                                Getting last non na value across rows in a pandas dataframe
                            
                                How can I align a button at the bottom right in pyqt?
                            
                                Using numpy to square value gives negative number
                            
                                Making a list of pandas dataframe row values from multiple columns
                            
                                Wrapping the text of a Kivy Label
                            
                                How to find the correlation between a group of values in a pandas dataframe column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With