Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Enhance performance of geopandas overlay(intersection)

I have two sets of shapefiles with polygons. One set of shapefile is just the US counties I'm interested in and this varies across firms and years. The other set of shapefile is the business area of firms and of course this varies across firms and years. I need to get the intersection of these two layers for each firm in each year. So far the function overlay(df1, df2, how = 'intersection') accomplished my goal. But it takes around 300s for each firm-year. Given that I have a long list of firms and many years, this would take me days to finish. Is there any way to enhance this performance?

I notice that if I do the same thing in ArcGIS, the 300s comes down to a few seconds. But I'm a new user of ArcGIS, not familiar with the python in it yet.

like image 499
Crystie Avatar asked Nov 22 '16 19:11

Crystie


2 Answers

If you look at the current geopandas overlay source code, they've actually updated the overlay function to utilize Rtree spatial indexing! I don't think that doing doing the Rtree manually would be any faster (actually will probably be slower) at this point in time.

See source code here: https://github.com/geopandas/geopandas/blob/master/geopandas/tools/overlay.py

like image 96
pasalacquaian Avatar answered Nov 14 '22 11:11

pasalacquaian


Hopefully you've figured this out by now, but the solution is to utilize Geopanda's R-tree spatial index. You can achieve orders of magnitude improvement by implementing it appropriately.

Goeff Boeing has written an excellent tutorial.

http://geoffboeing.com/2016/10/r-tree-spatial-index-python/

like image 30
andrew Avatar answered Nov 14 '22 11:11

andrew