How to use geopanda or shapely to find nearest point in same geodataframe

Question

I have a geodataframe showing ~25 locations represented as point geometry. I am trying to come up with a script that goes through each point, identifies the nearest location and returns the name of the nearest location and the distance.

I can easily do this if I have different geodataframes using nearest_points(geom1, geom2) in the shapely.ops library. However all my locations are stored in one geodataframe. I am trying to loop through and that is where I am having trouble

here is my sample file:

geofile = gpd.GeoDataFrame([[0, 'location A', Point(55, 55)],
                            [1, 'location B', Point(66, 66)],
                            [2, 'Location C', Point(99, 99)],
                            [3, 'Location D', Point(11, 11)]],
                           columns=['ID','Location','geometry'])

Here is the loop I am creating to no avail.

for index, row in geofile.iterrows():
    nearest_geoms=nearest_points(row, geofile)
    print('location:' + nearest_geoms[0])
    print('nearest:' + nearest_geoms[1])
    print('-------')

I am getting this error:

AttributeError: 'Series' object has no attribute '_geom'

However I think my problem is beyond the error cause somehow I have to exclude the row I am looping through cause that will automatically return as the closest location since it is that location.

My end result for one location would be the following:

([0,'location A','location B', '5 miles', Point(55,55)], columns=['ID','Location','Nearest', 'Distance',geometry'])

Christoph Rieke · Accepted Answer

Shapely's nearest_points function compares shapely geometries. To compare a single Point geometry against multiple other Point geometries, you can use .unary_union to compare against the resulting MultiPoint geometry. And yes, at each row operation, drop the respective point so it is not compared against itself.

import geopandas as gpd
from shapely.geometry import Point
from shapely.ops import nearest_points

df = gpd.GeoDataFrame([[0, 'location A', Point(55,55)], 
                       [1, 'location B', Point(66,66)],
                       [2, 'Location C', Point(99,99)],
                       [3, 'Location D' ,Point(11,11)]], 
                      columns=['ID','Location','geometry'])
df.insert(3, 'nearest_geometry', None)

for index, row in df.iterrows():
    point = row.geometry
    multipoint = df.drop(index, axis=0).geometry.unary_union
    queried_geom, nearest_geom = nearest_points(point, multipoint)
    df.loc[index, 'nearest_geometry'] = nearest_geom

Resulting in

    ID  Location    geometry        nearest_geometry
0   0   location A  POINT (55 55)   POINT (66 66)
1   1   location B  POINT (66 66)   POINT (55 55)
2   2   Location C  POINT (99 99)   POINT (66 66)
3   3   Location D  POINT (11 11)   POINT (55 55)

enter image description here

mcsoini · Answer

Here is another approach based on scipy.spatial.distance.cdist. The iterrows is avoided by using numpy masked arrays.

import geopandas as gpd
from scipy.spatial import distance
import numpy.ma as ma
from shapely.geometry import Point
import numpy as np

df = gpd.GeoDataFrame([[0, 'location A', Point(55,55)], 
                       [1, 'location B', Point(66,66)],
                       [2, 'Location C', Point(99,99)],
                       [3, 'Location D' ,Point(11,11)]], 
                      columns=['ID','Location','geometry'])

coords = np.stack(df.geometry.apply(lambda x: [x.x, x.y]))
distance_matrix = ma.masked_where((dist := distance.cdist(*[coords] * 2)) == 0, dist)
df["closest_ID"] = np.argmin(distance_matrix, axis=0)
df = df.join(df.set_index("ID").geometry.rename("nearest_geometry"), on="closest_ID")
df.drop("closest_ID", axis=1)

# Out:
   ID    Location               geometry           nearest_geometry
0   0  location A  POINT (55.000 55.000)  POINT (66.00000 66.00000)
1   1  location B  POINT (66.000 66.000)  POINT (55.00000 55.00000)
2   2  Location C  POINT (99.000 99.000)  POINT (66.00000 66.00000)
3   3  Location D  POINT (11.000 11.000)  POINT (55.00000 55.00000)

Generalization for multiple neighbors

Since the distance_matrix contains the complete information on distances between all pairs of points, it is easy to generalize this approach to arbitrary numbers of neighbors. For example, if we are interested in finding the N_NEAREST = 2 neighbors for each point, we can sort the distance matrix (using np.argsort, instead of picking the np.argmin, as before) and select the corresponding number of columns:

nearest_id_cols = list(map("nearest_id_{}".format, range(1, N_NEAREST + 1)))
nearest_geom_cols = list(map("nearest_geometry_{}".format, range(1, N_NEAREST + 1)))
df[nearest_id_cols] = np.argsort(distance_matrix, axis=1)[:, :N_NEAREST]
df[nearest_geom_cols] = df[nearest_id_cols].applymap(
                             lambda x: df.set_index("ID").geometry[x])

# out:
   ID    Location                  geometry  nearest_id_1  nearest_id_2  \
0   0  location A  POINT (55.00000 55.00000)             1             2   
1   1  location B  POINT (66.00000 66.00000)             0             2   
2   2  Location C  POINT (99.00000 99.00000)             1             0   
3   3  Location D  POINT (11.00000 11.00000)             0             1   

  nearest_geometry_1 nearest_geometry_2  
0       POINT (66 66)       POINT (99 99)  
1       POINT (55 55)       POINT (99 99)  
2       POINT (66 66)       POINT (55 55)  
3       POINT (55 55)       POINT (66 66)

How to use geopanda or shapely to find nearest point in same geodataframe

Tags:

python

gis

shapely

geopandas

rzt101

2 Answers

Christoph Rieke

Generalization for multiple neighbors

mcsoini

Recent Activity

Donate For Us

How to use geopanda or shapely to find nearest point in same geodataframe

Tags:

python

gis

shapely

geopandas

rzt101

2 Answers

Christoph Rieke

Generalization for multiple neighbors

mcsoini

Related questions

Recent Activity

Donate For Us