Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to calculate distance matrix given latitude and longitude data in Python

I have data for latitude and longitude, and I need to calculate distance matrix between two arrays containing locations. I used this This to get distance between two locations given latitude and longitude.

Here is an example of my code:

import numpy as np
import math

def get_distances(locs_1, locs_2):
    n_rows_1 = locs_1.shape[0]
    n_rows_2 = locs_2.shape[0]
    dists = np.empty((n_rows_1, n_rows_2))
    # The loops here are inefficient
    for i in xrange(n_rows_1):
        for j in xrange(n_rows_2):
            dists[i, j] = get_distance_from_lat_long(locs_1[i], locs_2[j])
    return dists


def get_distance_from_lat_long(loc_1, loc_2):

    earth_radius = 3958.75

    lat_dif = math.radians(loc_1[0] - loc_2[0])
    long_dif = math.radians(loc_1[1] - loc_2[1])
    sin_d_lat = math.sin(lat_dif / 2)
    sin_d_long = math.sin(long_dif / 2)
    step_1 = (sin_d_lat ** 2) + (sin_d_long ** 2) * math.cos(math.radians(loc_1[0])) * math.cos(math.radians(loc_2[0])) 
    step_2 = 2 * math.atan2(math.sqrt(step_1), math.sqrt(1-step_1))
    dist = step_2 * earth_radius

    return dist

My expected output is this:

>>> locations_1 = np.array([[34, -81], [32, -87], [35, -83]])
>>> locations_2 = np.array([[33, -84], [39, -81], [40, -88], [30, -80]])
>>> get_distances(locations_1, locations_2)
array([[ 186.13522573,  345.46610882,  566.23466349,  282.51056676],
       [ 187.96657622,  589.43369894,  555.55312473,  436.88855214],
       [ 149.5853537 ,  297.56950329,  440.81203371,  387.12153747]])

Performance is important for me, and one thing I could do is use Cython to speed up the loops, but it would be nice if I don't have to go there.

Is there a module that can do something like this? Or any other solution?

like image 434
Akavall Avatar asked Oct 16 '13 20:10

Akavall


People also ask

How do I get the distance between two latitude and longitude in Python?

The math. dist() method returns the Euclidean distance between two points (p and q), where p and q are the coordinates of that point.

Can you calculate distance based on latitude and longitude?

One of the most common ways to calculate distances using latitude and longitude is the haversine formula, which is used to measure distances on a sphere. This method uses spherical triangles and measures the sides and angles of each to calculate the distance between points.

How do you calculate distance between coordinates in Python?

Euclidean distance using math library You can use the math. dist() function to get the Euclidean distance between two points in Python. For example, let's use it the get the distance between two 3-dimensional points each represented by a tuple.


2 Answers

It is more efiicient when using meshgrid to replace the double for loop:

import numpy as np

earth_radius = 3958.75

def get_distances(locs_1, locs_2):
   lats1, lats2 = np.meshgrid(locs_1[:,0], locs_2[:,0])
   lons1, lons2 = np.meshgrid(locs_1[:,1], locs_2[:,1])

   lat_dif = np.radians(lats1 - lats2)
   long_dif = np.radians(lons1 - lons2)

   sin_d_lat = np.sin(lat_dif / 2.)
   sin_d_long = np.sin(long_dif / 2.)

   step_1 = (sin_d_lat ** 2) + (sin_d_long ** 2) * np.cos(np.radians(lats1[0])) * np.cos(np.radians(lats2[0])) 
   step_2 = 2 * np.arctan2(np.sqrt(step_1), np.sqrt(1-step_1))

   dist = step_2 * earth_radius

   return dist
like image 170
ManuGoacolou Avatar answered Oct 05 '22 23:10

ManuGoacolou


There's a lot of suboptimal things in the Haversine equations you are using. You can trim some of that and minimize the number of sines, cosines and square roots you need to calculate. The following is the best I have been able to come up with, and on my system runs about 5x faster than Ophion's code (which does mostly the same as far as vectorization goes) on two random arrays of 1000 and 2000 elements:

def spherical_dist(pos1, pos2, r=3958.75):
    pos1 = pos1 * np.pi / 180
    pos2 = pos2 * np.pi / 180
    cos_lat1 = np.cos(pos1[..., 0])
    cos_lat2 = np.cos(pos2[..., 0])
    cos_lat_d = np.cos(pos1[..., 0] - pos2[..., 0])
    cos_lon_d = np.cos(pos1[..., 1] - pos2[..., 1])
    return r * np.arccos(cos_lat_d - cos_lat1 * cos_lat2 * (1 - cos_lon_d))

If you feed it your two arrays "as is" it will complain, but that's not a bug, it's a feature. Basically, this function computes the distance on a sphere over the last dimension, and broadcasts on the rest. So you can get what you are after as:

>>> spherical_dist(locations_1[:, None], locations_2)
array([[ 186.13522573,  345.46610882,  566.23466349,  282.51056676],
       [ 187.96657622,  589.43369894,  555.55312473,  436.88855214],
       [ 149.5853537 ,  297.56950329,  440.81203371,  387.12153747]])

But it could also be used to calculate the distances between two lists of points, i.e.:

>>> spherical_dist(locations_1, locations_2[:-1])
array([ 186.13522573,  589.43369894,  440.81203371])

Or between two single points:

>>> spherical_dist(locations_1[0], locations_2[0])
186.1352257300577

This is inspired on how gufuncs work, and once you get used to it, I have found it to be a wonderful "swiss army knife" coding style, that lets you reuse a single function in lots of different settings.

like image 41
Jaime Avatar answered Oct 06 '22 01:10

Jaime