I have a geodataframe showing ~25 locations represented as point geometry. I am trying to come up with a script that goes through each point, identifies the nearest location and returns the name of the nearest location and the distance.
I can easily do this if I have different geodataframes using nearest_points(geom1, geom2) in the shapely.ops library. However all my locations are stored in one geodataframe. I am trying to loop through and that is where I am having trouble
here is my sample file:
geofile = gpd.GeoDataFrame([[0, 'location A', Point(55, 55)],
[1, 'location B', Point(66, 66)],
[2, 'Location C', Point(99, 99)],
[3, 'Location D', Point(11, 11)]],
columns=['ID','Location','geometry'])
Here is the loop I am creating to no avail.
for index, row in geofile.iterrows():
nearest_geoms=nearest_points(row, geofile)
print('location:' + nearest_geoms[0])
print('nearest:' + nearest_geoms[1])
print('-------')
I am getting this error:
AttributeError: 'Series' object has no attribute '_geom'
However I think my problem is beyond the error cause somehow I have to exclude the row I am looping through cause that will automatically return as the closest location since it is that location.
My end result for one location would be the following:
([0,'location A','location B', '5 miles', Point(55,55)], columns=['ID','Location','Nearest', 'Distance',geometry'])
Shapely's nearest_points function compares shapely geometries. To compare a single Point geometry against multiple other Point geometries, you can use .unary_union to compare against the resulting MultiPoint geometry. And yes, at each row operation, drop the respective point so it is not compared against itself.
import geopandas as gpd
from shapely.geometry import Point
from shapely.ops import nearest_points
df = gpd.GeoDataFrame([[0, 'location A', Point(55,55)],
[1, 'location B', Point(66,66)],
[2, 'Location C', Point(99,99)],
[3, 'Location D' ,Point(11,11)]],
columns=['ID','Location','geometry'])
df.insert(3, 'nearest_geometry', None)
for index, row in df.iterrows():
point = row.geometry
multipoint = df.drop(index, axis=0).geometry.unary_union
queried_geom, nearest_geom = nearest_points(point, multipoint)
df.loc[index, 'nearest_geometry'] = nearest_geom
Resulting in
ID Location geometry nearest_geometry
0 0 location A POINT (55 55) POINT (66 66)
1 1 location B POINT (66 66) POINT (55 55)
2 2 Location C POINT (99 99) POINT (66 66)
3 3 Location D POINT (11 11) POINT (55 55)
Here is another approach based on scipy.spatial.distance.cdist
. The iterrows
is avoided by using numpy masked arrays.
import geopandas as gpd
from scipy.spatial import distance
import numpy.ma as ma
from shapely.geometry import Point
import numpy as np
df = gpd.GeoDataFrame([[0, 'location A', Point(55,55)],
[1, 'location B', Point(66,66)],
[2, 'Location C', Point(99,99)],
[3, 'Location D' ,Point(11,11)]],
columns=['ID','Location','geometry'])
coords = np.stack(df.geometry.apply(lambda x: [x.x, x.y]))
distance_matrix = ma.masked_where((dist := distance.cdist(*[coords] * 2)) == 0, dist)
df["closest_ID"] = np.argmin(distance_matrix, axis=0)
df = df.join(df.set_index("ID").geometry.rename("nearest_geometry"), on="closest_ID")
df.drop("closest_ID", axis=1)
# Out:
ID Location geometry nearest_geometry
0 0 location A POINT (55.000 55.000) POINT (66.00000 66.00000)
1 1 location B POINT (66.000 66.000) POINT (55.00000 55.00000)
2 2 Location C POINT (99.000 99.000) POINT (66.00000 66.00000)
3 3 Location D POINT (11.000 11.000) POINT (55.00000 55.00000)
Since the distance_matrix
contains the complete information on distances between all pairs of points, it is easy to generalize this approach to arbitrary numbers of neighbors. For example, if we are interested in finding the N_NEAREST = 2
neighbors for each point, we can sort the distance matrix (using np.argsort
, instead of picking the np.argmin
, as before) and select the corresponding number of columns:
nearest_id_cols = list(map("nearest_id_{}".format, range(1, N_NEAREST + 1)))
nearest_geom_cols = list(map("nearest_geometry_{}".format, range(1, N_NEAREST + 1)))
df[nearest_id_cols] = np.argsort(distance_matrix, axis=1)[:, :N_NEAREST]
df[nearest_geom_cols] = df[nearest_id_cols].applymap(
lambda x: df.set_index("ID").geometry[x])
# out:
ID Location geometry nearest_id_1 nearest_id_2 \
0 0 location A POINT (55.00000 55.00000) 1 2
1 1 location B POINT (66.00000 66.00000) 0 2
2 2 Location C POINT (99.00000 99.00000) 1 0
3 3 Location D POINT (11.00000 11.00000) 0 1
nearest_geometry_1 nearest_geometry_2
0 POINT (66 66) POINT (99 99)
1 POINT (55 55) POINT (99 99)
2 POINT (66 66) POINT (55 55)
3 POINT (55 55) POINT (66 66)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With