Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding largest inscribed rectangle of a polygon with Shapely

I am trying to locate millions of points inside a half-dozen polygons. Here's my code:

def find_shape(longitude,latitude):
    if longitude != 0 and latitude != 0:
        point = shapely.geometry.Point(longitude,latitude)
    else:
        return "Unknown"
    for current_shape in all_shapes:
        if current_shape['bounding_box'].contains(point):
            if current_shape['shape'].contains(point):
                return current_shape['properties']['ShapeName']
                break
    return "Unknown"

I've read the other questions that deal with improving performance for point-in-polygon queries with shapely. They suggest Rtrees. However, it seems like this is useful for cases when there are many polygons (36,000 in one question, 100,000 in another) and it is not desirable to loop over them all.

I am already setting up a bounding box, as you can see. Here's my shape setup code:

with fiona.open(SHAPEFILE) as f_col:
    all_shapes = []
    for shapefile_record in f_col:
        current_shape = {}
        current_shape['shape'] = shapely.geometry.asShape(shapefile_record['geometry'])
        minx, miny, maxx, maxy = current_shape['shape'].bounds
        current_shape['bounding_box'] = shapely.geometry.box(minx, miny, maxx, maxy)
        current_shape['properties'] = shapefile_record['properties']
        all_shapes.append(current_shape)

Would it be useful to also check another very simplified version of the shape, namely one made of the largest inscribed rectangle (or maybe triangle)?

Checking the shapely docs, it doesn't seem there's a function for this. Maybe some setting of simplify()? Of course, I always want to make sure the new simplified shape does not extend beyond the bounds of the original shape, so I don't have to call contains() on the actual shape. I also think I want to make the new simplified shape as simple as possible, for speed.

Any other suggestions appreciated for as well. Thanks!

EDIT: While awaiting replies, I have hit on this idea for creating a simplified shape meeting my requirements:

current_shape['simple_shape'] = current_shape['shape'].simplify(.5)
current_shape['simple_shape'] = current_shape['simple_shape'].intersection(current_shape['shape'])

Here's how I use it when testing each point:

if current_shape['simple_shape'].contains(point):
    return current_shape['properties']['ShapeName']
elif current_shape['shape'].contains(point):
    return current_shape['properties']['ShapeName']

This is not perfect, because the shape is not as simple as it could be after doing the necessary intersection(). Nevertheless, this approach has yielded a 60% decrease in processing time. In my tests, the simple polygon is used on 85% of point queries.

EDIT 2: Another related question over on GIS StackExchange: Python Efficiency — Need suggestions about how to use OGR and Shapely in more efficient way. This deals with 1.5 million points in about 3,000 polygons.

like image 700
Martin Burch Avatar asked Mar 11 '15 14:03

Martin Burch


1 Answers

I would use an R-Tree. But I'd insert all your points (and not the polygon's bounding box) into the R-Tree.

use r tree index for example: http://toblerity.org/rtree/

from rtree import index
from rtree.index import Rtree

idx = index.Index();

// Inserting a point, i.e. where left == right && top == bottom, will essentially insert a single point entry into the index

for current_point in all_points:
    idx.insert(current_point.id, (current_point.x, current_point.y, current_point.x, current_point.y))

// Now loop through your polygons

for current_shape in all_shapes:
   for n in idx.intersect( current_shape['bounding_box'] )
      if current_shape['shape'].contains(n):
         # now we know that n is inside the current shape

So you end up with half a dozen queries on your larger R-Tree rather than with millions of queries on a small R-Tree.

like image 137
Markus Schumann Avatar answered Oct 20 '22 19:10

Markus Schumann