I am trying to locate millions of points inside a half-dozen polygons. Here's my code:
def find_shape(longitude,latitude):
if longitude != 0 and latitude != 0:
point = shapely.geometry.Point(longitude,latitude)
else:
return "Unknown"
for current_shape in all_shapes:
if current_shape['bounding_box'].contains(point):
if current_shape['shape'].contains(point):
return current_shape['properties']['ShapeName']
break
return "Unknown"
I've read the other questions that deal with improving performance for point-in-polygon queries with shapely. They suggest Rtrees. However, it seems like this is useful for cases when there are many polygons (36,000 in one question, 100,000 in another) and it is not desirable to loop over them all.
I am already setting up a bounding box, as you can see. Here's my shape setup code:
with fiona.open(SHAPEFILE) as f_col:
all_shapes = []
for shapefile_record in f_col:
current_shape = {}
current_shape['shape'] = shapely.geometry.asShape(shapefile_record['geometry'])
minx, miny, maxx, maxy = current_shape['shape'].bounds
current_shape['bounding_box'] = shapely.geometry.box(minx, miny, maxx, maxy)
current_shape['properties'] = shapefile_record['properties']
all_shapes.append(current_shape)
Would it be useful to also check another very simplified version of the shape, namely one made of the largest inscribed rectangle (or maybe triangle)?
Checking the shapely docs, it doesn't seem there's a function for this. Maybe some setting of simplify()
? Of course, I always want to make sure the new simplified shape does not extend beyond the bounds of the original shape, so I don't have to call contains()
on the actual shape. I also think I want to make the new simplified shape as simple as possible, for speed.
Any other suggestions appreciated for as well. Thanks!
EDIT: While awaiting replies, I have hit on this idea for creating a simplified shape meeting my requirements:
current_shape['simple_shape'] = current_shape['shape'].simplify(.5)
current_shape['simple_shape'] = current_shape['simple_shape'].intersection(current_shape['shape'])
Here's how I use it when testing each point:
if current_shape['simple_shape'].contains(point):
return current_shape['properties']['ShapeName']
elif current_shape['shape'].contains(point):
return current_shape['properties']['ShapeName']
This is not perfect, because the shape is not as simple as it could be after doing the necessary intersection()
. Nevertheless, this approach has yielded a 60% decrease in processing time. In my tests, the simple polygon is used on 85% of point queries.
EDIT 2: Another related question over on GIS StackExchange: Python Efficiency — Need suggestions about how to use OGR and Shapely in more efficient way. This deals with 1.5 million points in about 3,000 polygons.
I would use an R-Tree. But I'd insert all your points (and not the polygon's bounding box) into the R-Tree.
use r tree index for example: http://toblerity.org/rtree/
from rtree import index
from rtree.index import Rtree
idx = index.Index();
// Inserting a point, i.e. where left == right && top == bottom, will essentially insert a single point entry into the index
for current_point in all_points:
idx.insert(current_point.id, (current_point.x, current_point.y, current_point.x, current_point.y))
// Now loop through your polygons
for current_shape in all_shapes:
for n in idx.intersect( current_shape['bounding_box'] )
if current_shape['shape'].contains(n):
# now we know that n is inside the current shape
So you end up with half a dozen queries on your larger R-Tree rather than with millions of queries on a small R-Tree.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With