I have 4 columns in my dataframe containing the following data:
Start_latitude<br>
Start_longitude<br>
Stop_latitude<br>
Stop_longitude<br>
I need to compute distance between the latitude longitude pair and create a new column with the distance computed.
I came across a package (geopy) which can do this for me. But I need to pass a tuple to geopy. How do i apply this function (geopy) across the dataframe in pandas for all the records?
I'd recommend you use pyproj instead of geopy. geopy relies on online services whereas pyproj is local (meaning it will be faster and won't rely on an internet connection) and more transparent about its methods (see here for instance), which are based on the Proj4 codebase that underlies essentially all open-source GIS software and, probably, many of the web services you'd use.
#!/usr/bin/env python3
import pandas as pd
import numpy as np
from pyproj import Geod
wgs84_geod = Geod(ellps='WGS84') #Distance will be measured on this ellipsoid - more accurate than a spherical method
#Get distance between pairs of lat-lon points
def Distance(lat1,lon1,lat2,lon2):
az12,az21,dist = wgs84_geod.inv(lon1,lat1,lon2,lat2) #Yes, this order is correct
return dist
#Create test data
lat1 = np.random.uniform(-90,90,100)
lon1 = np.random.uniform(-180,180,100)
lat2 = np.random.uniform(-90,90,100)
lon2 = np.random.uniform(-180,180,100)
#Package as a dataframe
df = pd.DataFrame({'lat1':lat1,'lon1':lon1,'lat2':lat2,'lon2':lon2})
#Add/update a column to the data frame with the distances (in metres)
df['dist'] = Distance(df['lat1'].tolist(),df['lon1'].tolist(),df['lat2'].tolist(),df['lon2'].tolist())
PyProj has some documentation here.
From the documentation of geopy: https://pypi.python.org/pypi/geopy. You can do this by doing:
from geopy.distance import vincenty
# Define the two points
start = (start_latitute, start_longitude)
stop = (stop_latitude, stop_longitude)
# Print the vincenty distance
print(vincenty(start, stop).meters)
# Print the great circle distance
print(great_circle(start, stop).meters)
Combining this with Pandas. Assuming you have a dataframe df
. We first create the function:
def distance_calc (row):
start = (row['start_latitute'], row['start_longitude'])
stop = (row['stop_latitude'], row['stop_longitude'])
return vincenty(start, stop).meters
And then apply it to the dataframe:
df['distance'] = df.apply (lambda row: distance_calc (row),axis=1)
Note the axis=1 specifier, that means that the application is done at a row, rather than a column level.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With