I'm looking to do the equivalent of the ArcPy Generate Near Table using Geopandas / Shapely. I'm very new to Geopandas and Shapely and have developed a methodology that works, but I'm wondering if there is a more efficient way of doing it.
I have two point file datasets - Census Block Centroids and restaurants. I'm looking to find, for each Census Block centroid, the distance to it's closest restaurant. There are no restrictions in terms of same restaurant being the closest restaurant for multiple blocks.
The reason this becomes a bit more complicated for me is because the Geopandas Distance function calculates elementwise, matching based on index. Therefore, my general methodology is to turn the Restaurants file into a multipoint file and then set the index of the blocks file to all be the same value. Then all of the block centroids and the restaurants have the same index value.
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon, Point, MultiPoint
Now read in the Block Centroid and Restaurant Shapefiles:
Blocks=gpd.read_file(BlockShp)
Restaurants=gpd.read_file(RestaurantShp)
Since the Geopandas distance function calculates distance elementwise, I convert the Restaurant GeoSeries to a MultiPoint GeoSeries:
RestMulti=gpd.GeoSeries(Restaurants.unary_union)
RestMulti.crs=Restaurants.crs
RestMulti.reset_index(drop=True)
Then I set the index for the Blocks equal to 0 (the same value as the Restaurants multipoint) as a work around for the elementwise calculation.
Blocks.index=[0]*len(Blocks)
Lastly, I use the Geopandas distance function to calculate the distance to the nearest restaurant for each Block centroid.
Blocks['Distance']=Blocks.distance(RestMulti)
Please offer any suggestions on how any aspect of this could be improved. I'm not tied to using Geopandas or Shapely, but I am looking to learn an alternative to ArcPy.
Thanks for the help!
The math. dist() method returns the Euclidean distance between two points (p and q), where p and q are the coordinates of that point. Note: The two points (p and q) must be of the same dimensions.
For this divide the values of longitude and latitude of both the points by 180/pi. The value of pi is 22/7. The value of 180/pi is approximately 57.29577951. If we want to calculate the distance between two places in miles, use the value 3, 963, which is the radius of Earth.
If I understand correctly your issue, Blocks and Restaurants can have very different dimensions. For this reason, it's probably a bad approach to try to force into a table format by reindexing.
I would just loop over blocks and get the minimum distance to restaurants (just as @shongololo was suggesting).
I'm going to be slightly more general (because I already have this code written down) and do a distance from points to lines, but the same code should work from points to points or from polygons to polygons. I'll start with a GeoDataFrame
for the points and I'll create a new column which has the minimum distance to lines.
%matplotlib inline
import matplotlib.pyplot as plt
import shapely.geometry as geom
import numpy as np
import pandas as pd
import geopandas as gpd
lines = gpd.GeoSeries(
[geom.LineString(((1.4, 3), (0, 0))),
geom.LineString(((1.1, 2.), (0.1, 0.4))),
geom.LineString(((-0.1, 3.), (1, 2.)))])
# 10 points
n = 10
points = gpd.GeoSeries([geom.Point(x, y) for x, y in np.random.uniform(0, 3, (n, 2))])
# Put the points in a dataframe, with some other random column
df_points = gpd.GeoDataFrame(np.array([points, np.random.randn(n)]).T)
df_points.columns = ['Geometry', 'Property1']
points.plot()
lines.plot()
Now get the distance from points to lines and only save the minimum distance for each point (see below for a version with apply)
min_dist = np.empty(n)
for i, point in enumerate(points):
min_dist[i] = np.min([point.distance(line) for line in lines])
df_points['min_dist_to_lines'] = min_dist
df_points.head(3)
which gives
Geometry Property1 min_dist_to_lines
0 POINT (0.2479424516236574 2.944916965334865) 2.621823 0.193293
1 POINT (1.465768457667432 2.605673714922998) 0.6074484 0.226353
2 POINT (2.831645235202689 1.125073838462032) 0.657191 1.940127
---- EDIT ----
(taken from a github issue) Using apply
is nicer and more consistent with how you'd do it in pandas
:
def min_distance(point, lines):
return lines.distance(point).min()
df_points['min_dist_to_lines'] = df_points.geometry.apply(min_distance, df_lines)
EDIT: As of at least 2019-10-04 it seems that a change in pandas requires a different input in the last code block, making use of the args
parameters in .apply()
:
df_points['min_dist_to_lines'] = df_points.geometry.apply(min_distance, args=(df_lines,))
I will use two sample datasets in geopandas with different dimensions to demonstrate.
import geopandas as gpd
# read geodata for five nyc boroughs
gdf_nyc = gpd.read_file(gpd.datasets.get_path('nybb'))
# read geodata for international cities
gdf_cities = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))
# convert to a meter projection
gdf_nyc.to_crs(epsg=3857, inplace=True)
gdf_cities.to_crs(epsg=3857, inplace=True)
We can simply apply a lambda function to the GeoSeries. For example, if we want to get the minimal distance between each NYC borough (polygon) and their nearest international city (point). We can do the following:
gdf_nyc.geometry.apply(lambda x: gdf_cities.distance(x).min())
This will give us
0 384422.953323
1 416185.725507
2 412520.308816
3 419511.323677
4 440292.945096
Name: geometry, dtype: float64
Similarly, if we want the minimal distance between each international city and their nearest NYC borough. We can do the following:
gdf_cities.geometry.apply(lambda x: gdf_nyc.distance(x).min())
This will give us
0 9.592104e+06
1 9.601345e+06
2 9.316354e+06
3 8.996945e+06
4 2.614927e+07
...
197 1.177410e+07
198 2.377188e+07
199 8.559704e+06
200 8.902146e+06
201 2.034579e+07
Name: geometry, Length: 202, dtype: float64
Notes:
epsg:3857
, so the distance will be in meters. If you use an ellipsoidal (lon/lat based) projection, the result will be degrees. Converting your projection first before anything else such as getting the centroids of your polygons..distance()
method will make sense when you want to get the distance, let say, between a point and a line. In other words, .distance()
method can calculate distance between any two geo-objects.geometry
columns in a GeoDataFrame, make sure to apply the lambda function to the desired GeoSeries and also call the .distance()
method from the desired GeoSeries. In the example, I called the method from the GeoDataFrame directly because both of them only have one GeoSeries column.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With