Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Geopandas, how do I select all points not within a polygon?

I have a DataFrame containing Chicago addresses which I've geocoded into latitude and longitude values, and then into Point objects (making the DataFrame a GeoDataFrame). A small fraction have been incorrectly geocoded with LatLong values outside of Chicago. I have a shapefile for Chicago's boundary (GeoDataFrame), I want to select all rows where the Points are outside of Chicago's boundary polygon.

It would be easy to select all points within the polygon (via geopandas sjoin function), but I haven't found a good way to select the points not within the polygon. Does one exist?

like image 227
MattTriano Avatar asked Oct 02 '18 01:10

MattTriano


People also ask

How do you change the geometry on Geopandas?

To change which column is the active geometry column, use the GeoDataFrame. set_geometry() method. Note 2: Somewhat confusingly, by default when you use the read_file() command, the column containing spatial objects from the file is named “geometry” by default, and will be set as the active geometry column.

Can Geopandas read shapefile?

You can also read in shapefiles as GeoDataFrames. The code below reads the ecoregions shapefile into a GeoDataFrame after first importing GeoPandas. If the shapefile is stored locally you can pass the relative or absolute path in place of the URL, as you did for reading it into a Spark DataFrame.

How do I merge Geopandas data frames?

There are two ways to combine datasets in geopandas – attribute joins and spatial joins. In an attribute join, a GeoSeries or GeoDataFrame is combined with a regular pandas Series or DataFrame based on a common variable. This is analogous to normal merging or joining in pandas.


1 Answers

If you convert the Chicago boundary GeoDataFrame to a single polygon, eg with:

chicago = df_chicago.geometry.unary_union

then you can use boolean filtering with the within operator to select points within and outside of Chicago:

within_chicago = df[df.geometry.within(chicago)]
outside_chicago = df[~df.geometry.within(chicago)]

using ~ to invert the boolean condition.

Alternatively, you could use the disjoint spatial predicate:

outside_chicago = df[df.geometry.disjoint(chicago)]
like image 167
joris Avatar answered Oct 11 '22 08:10

joris