Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate distance between latitude and longitude in dataframe

I have 4 columns in my dataframe containing the following data:

Start_latitude<br>
Start_longitude<br>
Stop_latitude<br>
Stop_longitude<br>

I need to compute distance between the latitude longitude pair and create a new column with the distance computed.

I came across a package (geopy) which can do this for me. But I need to pass a tuple to geopy. How do i apply this function (geopy) across the dataframe in pandas for all the records?

like image 272
Harikrishna Avatar asked Jun 08 '17 23:06

Harikrishna


2 Answers

I'd recommend you use pyproj instead of geopy. geopy relies on online services whereas pyproj is local (meaning it will be faster and won't rely on an internet connection) and more transparent about its methods (see here for instance), which are based on the Proj4 codebase that underlies essentially all open-source GIS software and, probably, many of the web services you'd use.

#!/usr/bin/env python3

import pandas as pd
import numpy as np
from pyproj import Geod

wgs84_geod = Geod(ellps='WGS84') #Distance will be measured on this ellipsoid - more accurate than a spherical method

#Get distance between pairs of lat-lon points
def Distance(lat1,lon1,lat2,lon2):
  az12,az21,dist = wgs84_geod.inv(lon1,lat1,lon2,lat2) #Yes, this order is correct
  return dist

#Create test data
lat1 = np.random.uniform(-90,90,100)
lon1 = np.random.uniform(-180,180,100)
lat2 = np.random.uniform(-90,90,100)
lon2 = np.random.uniform(-180,180,100)

#Package as a dataframe
df = pd.DataFrame({'lat1':lat1,'lon1':lon1,'lat2':lat2,'lon2':lon2})

#Add/update a column to the data frame with the distances (in metres)
df['dist'] = Distance(df['lat1'].tolist(),df['lon1'].tolist(),df['lat2'].tolist(),df['lon2'].tolist())

PyProj has some documentation here.

like image 103
Richard Avatar answered Oct 16 '22 06:10

Richard


From the documentation of geopy: https://pypi.python.org/pypi/geopy. You can do this by doing:

from geopy.distance import vincenty

# Define the two points
start = (start_latitute, start_longitude)
stop = (stop_latitude, stop_longitude)

# Print the vincenty distance
print(vincenty(start, stop).meters)

# Print the great circle distance
print(great_circle(start, stop).meters)

Combining this with Pandas. Assuming you have a dataframe df. We first create the function:

def distance_calc (row):
    start = (row['start_latitute'], row['start_longitude'])
    stop = (row['stop_latitude'], row['stop_longitude'])

    return vincenty(start, stop).meters

And then apply it to the dataframe:

df['distance'] = df.apply (lambda row: distance_calc (row),axis=1)

Note the axis=1 specifier, that means that the application is done at a row, rather than a column level.

like image 20
Remy Kabel Avatar answered Oct 16 '22 07:10

Remy Kabel