Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

K-nearest points from two dataframes with GeoPandas

GeoPandas uses shapely under the hood. To get the nearest neighbor I saw the use of nearest_points from shapely. However, this approach does not include k-nearest points.

I needed to compute distances to nearest points from to GeoDataFrames and insert the distance into the GeoDataFrame containing the "from this point" data.

This is my approach using GeoSeries.distance() without using another package or library. Note that when k == 1 the returned value essentially shows the distance to the nearest point. There is also a GeoPandas-only solution for nearest point by @cd98 which inspired my approach.

This works well for my data, but I wonder if there is a better or faster approach or another benefit to use shapely or sklearn.neighbors?

import pandas as pd
import geopandas as gp

gdf1 > GeoDataFrame with point type geometry column - distance from this point
gdf2 > GeoDataFrame with point type geometry column - distance to this point

def knearest(from_points, to_points, k):
    distlist = to_points.distance(from_points)
    distlist.sort_values(ascending=True, inplace=True) # To have the closest ones first
    return distlist[:k].mean()

# looping through a list of nearest points
for Ks in [1, 2, 3, 4, 5, 10]:
    name = 'dist_to_closest_' + str(Ks) # to set column name
    gdf1[name] = gdf1.geometry.apply(knearest, args=(gdf2, closest_x))
like image 871
raummensch Avatar asked Feb 18 '26 21:02

raummensch


2 Answers

yes there is, but first, I must credit the University of Helsinki from automating GIS process, here's the source code. Here's how
first, read the data, for example, finding nearest bus stops for each building.

# Filepaths
stops = gpd.read_file('data/pt_stops_helsinki.gpkg')
buildings = read_gdf_from_zip('data/building_points_helsinki.zip')

define the function, here, you can adjust the k_neighbors

from sklearn.neighbors import BallTree
import numpy as np

def get_nearest(src_points, candidates, k_neighbors=1):
    """Find nearest neighbors for all source points from a set of candidate points"""

    # Create tree from the candidate points
    tree = BallTree(candidates, leaf_size=15, metric='haversine')

    # Find closest points and distances
    distances, indices = tree.query(src_points, k=k_neighbors)

    # Transpose to get distances and indices into arrays
    distances = distances.transpose()
    indices = indices.transpose()

    # Get closest indices and distances (i.e. array at index 0)
    # note: for the second closest points, you would take index 1, etc.
    closest = indices[0]
    closest_dist = distances[0]

    # Return indices and distances
    return (closest, closest_dist)


def nearest_neighbor(left_gdf, right_gdf, return_dist=False):
    """
    For each point in left_gdf, find closest point in right GeoDataFrame and return them.

    NOTICE: Assumes that the input Points are in WGS84 projection (lat/lon).
    """

    left_geom_col = left_gdf.geometry.name
    right_geom_col = right_gdf.geometry.name

    # Ensure that index in right gdf is formed of sequential numbers
    right = right_gdf.copy().reset_index(drop=True)

    # Parse coordinates from points and insert them into a numpy array as RADIANS
    left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())
    right_radians = np.array(right[right_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())

    # Find the nearest points
    # -----------------------
    # closest ==> index in right_gdf that corresponds to the closest point
    # dist ==> distance between the nearest neighbors (in meters)

    closest, dist = get_nearest(src_points=left_radians, candidates=right_radians)

    # Return points from right GeoDataFrame that are closest to points in left GeoDataFrame
    closest_points = right.loc[closest]

    # Ensure that the index corresponds the one in left_gdf
    closest_points = closest_points.reset_index(drop=True)

    # Add distance if requested
    if return_dist:
        # Convert to meters from radians
        earth_radius = 6371000  # meters
        closest_points['distance'] = dist * earth_radius

    return closest_points

Do the nearest neighbours analysis

# Find closest public transport stop for each building and get also the distance based on haversine distance
# Note: haversine distance which is implemented here is a bit slower than using e.g. 'euclidean' metric
# but useful as we get the distance between points in meters
closest_stops = nearest_neighbor(buildings, stops, return_dist=True)

now join the from and to data frame

# Rename the geometry of closest stops gdf so that we can easily identify it
closest_stops = closest_stops.rename(columns={'geometry': 'closest_stop_geom'})

# Merge the datasets by index (for this, it is good to use '.join()' -function)
buildings = buildings.join(closest_stops)
like image 126
sutan Avatar answered Feb 20 '26 11:02

sutan


The answer above using Automating GIS-processes is really nice but there is an error when converting points as a numpy array as RADIANS. The latitude and longitude are reversed.

left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: (geom.y * np.pi / 180, geom.x * np.pi / 180)).to_list())

Indeed Points are given with (lat, lon) but the longitude correspond the x-axis of a plan or a sphere and the latitude to the y-axis.

like image 42
Jean-Louis Lamezec Avatar answered Feb 20 '26 11:02

Jean-Louis Lamezec



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!