I have two dataframes, both of which contain columns of latitude and longitude. For each lat/lon entry in the first dataframe, I want to evaluate each lat/lon pair in the second dataframe to determine distance.
For example:
df1: df2:
lat lon lat lon
0 38.32 -100.50 0 37.65 -97.87
1 42.51 -97.39 1 33.31 -96.40
2 33.45 -103.21 2 36.22 -100.01
distance between 38.32,-100.50 and 37.65,-97.87
distance between 38.32,-100.50 and 33.31,-96.40
distance between 38.32,-100.50 and 36.22,-100.01
distance between 42.51,-97.39 and 37.65,-97.87
distance between 42.51,-97.39 and 33.31,-96.40
...and so on...
I'm not sure how to go about doing this.
Thanks for the help!
Euclidean Distance is calculated as
You can do this with your two dataframes like this
((df1 - df2) ** 2).sum(1) ** .5
0 2.714001
1 9.253113
2 4.232363
dtype: float64
You can perform a cross join to get all combinations of lat/lon, then compute the distance using an appropriate measure. To do so, you can use the geopy package, which supplies geopy.distance.vincenty and geopy.distance.great_circle. Both should give valid distances, with vincenty giving more accurate results, but being computationally slower.
from geopy.distance import vincenty
# Function to compute distances.
def get_lat_lon_dist(row):
# Store lat/long as tuples for input into distance functions.
latlon1 = tuple(row[['lat1', 'lon1']])
latlon2 = tuple(row[['lat2', 'lon2']])
# Compute the distance.
return vincenty(latlon1, latlon2).km
# Perform a cross-join to get all combinations of lat/lon.
dist = pd.merge(df1.assign(k=1), df2.assign(k=1), on='k', suffixes=('1', '2')) \
.drop('k', axis=1)
# Compute the distances between lat/longs
dist['distance'] = dist.apply(get_lat_lon_dist, axis=1)
I used kilometers as my units in the example, but others can be specified, e.g.:
vincenty(latlon1, latlon2).miles
The resulting output:
lat1 lon1 lat2 lon2 distance
0 38.32 -100.50 37.65 -97.87 242.709065
1 38.32 -100.50 33.31 -96.40 667.878723
2 38.32 -100.50 36.22 -100.01 237.080141
3 42.51 -97.39 37.65 -97.87 541.184297
4 42.51 -97.39 33.31 -96.40 1024.839512
5 42.51 -97.39 36.22 -100.01 733.819732
6 33.45 -103.21 37.65 -97.87 671.766908
7 33.45 -103.21 33.31 -96.40 633.751134
8 33.45 -103.21 36.22 -100.01 424.335874
Edit
As noted by @MaxU in the comments, you can use a numpy implementation of the Haversine formula in a similar manner for extra performance. This should be equivalent to the great_circle function in geopy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With