Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorised Haversine formula with a pandas dataframe

I know that to find the distance between two latitude, longitude points I need to use the haversine function:

def haversine(lon1, lat1, lon2, lat2):
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    km = 6367 * c
    return km

I have a DataFrame where one column is latitude and another column is longitude. I want to find out how far these points are from a set point, -56.7213600, 37.2175900. How do I take the values from the DataFrame and put them into the function?

example DataFrame:

     SEAZ     LAT          LON
1    296.40,  58.7312210,  28.3774110  
2    274.72,  56.8148320,  31.2923240
3    192.25,  52.0649880,  35.8018640
4     34.34,  68.8188750,  67.1933670
5    271.05,  56.6699880,  31.6880620
6    131.88,  48.5546220,  49.7827730
7    350.71,  64.7742720,  31.3953780
8    214.44,  53.5192920,  33.8458560
9      1.46,  67.9433740,  38.4842520
10   273.55,  53.3437310,   4.4716664
like image 392
user3755536 Avatar asked Sep 10 '14 14:09

user3755536


People also ask

How does Haversine formula work?

The haversine formula determines the great-circle distance between two points on a sphere given their longitudes and latitudes. Important in navigation, it is a special case of a more general formula in spherical trigonometry, the law of haversines, that relates the sides and angles of spherical triangles.

What is Haversine in Python?

The Haversine (or great circle) distance is the angular distance between two points on the surface of a sphere. The first coordinate of each point is assumed to be the latitude, the second is the longitude, given in radians. The dimension of the data must be 2.


1 Answers

I can't confirm if the calculations are correct but the following worked:

In [11]:

from numpy import cos, sin, arcsin, sqrt
from math import radians

def haversine(row):
    lon1 = -56.7213600
    lat1 = 37.2175900
    lon2 = row['LON']
    lat2 = row['LAT']
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * arcsin(sqrt(a)) 
    km = 6367 * c
    return km

df['distance'] = df.apply(lambda row: haversine(row), axis=1)
df
Out[11]:
         SEAZ        LAT        LON     distance
index                                           
1      296.40  58.731221  28.377411  6275.791920
2      274.72  56.814832  31.292324  6509.727368
3      192.25  52.064988  35.801864  6990.144378
4       34.34  68.818875  67.193367  7357.221846
5      271.05  56.669988  31.688062  6538.047542
6      131.88  48.554622  49.782773  8036.968198
7      350.71  64.774272  31.395378  6229.733699
8      214.44  53.519292  33.845856  6801.670843
9        1.46  67.943374  38.484252  6418.754323
10     273.55  53.343731   4.471666  4935.394528

The following code is actually slower on such a small dataframe but I applied it to a 100,000 row df:

In [35]:

%%timeit
df['LAT_rad'], df['LON_rad'] = np.radians(df['LAT']), np.radians(df['LON'])
df['dLON'] = df['LON_rad'] - math.radians(-56.7213600)
df['dLAT'] = df['LAT_rad'] - math.radians(37.2175900)
df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(math.radians(37.2175900)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))

1 loops, best of 3: 17.2 ms per loop

Compared to the apply function which took 4.3s so nearly 250 times quicker, something to note in the future

If we compress all the above in to a one-liner:

In [39]:

%timeit df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin((np.radians(df['LAT']) - math.radians(37.2175900))/2)**2 + math.cos(math.radians(37.2175900)) * np.cos(np.radians(df['LAT'])) * np.sin((np.radians(df['LON']) - math.radians(-56.7213600))/2)**2))
100 loops, best of 3: 12.6 ms per loop

We observe further speed ups now a factor of ~341 times quicker.

like image 75
EdChum Avatar answered Sep 30 '22 12:09

EdChum