Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the nearest location using numpy

I have 2 sets of geo-codes as pandas series and I am trying to find the fastest way to get the minimum euclidean distance of points in set A from points in set B. That is: the closest point to 40.748043 & -73.992953 from the second set,and so on. Would really appreciate any suggestions/help.

Set A:
    print(latitude1)
    print(longitude1)

    0    40.748043
    1    42.361016

    Name: latitude, dtype: float64
    0    -73.992953
    1    -71.020005
    Name: longitude, dtype: float64

Set B:
    print(latitude2)
    print(longitude2)

    0    42.50729
    1    42.50779
    2    25.56473
    3    25.78953
    4    25.33132
    5    25.06570
    6    25.59246
    7    25.61955
    8    25.33737
    9    24.11028
    Name: latitude, dtype: float64
    0     1.53414
    1     1.52109
    2    55.55517
    3    55.94320
    4    56.34199
    5    55.17128
    6    56.26176
    7    56.27291
    8    55.41206
    9    52.73056
    Name: longitude, dtype: float64
like image 592
Vijay Avatar asked Mar 16 '18 14:03

Vijay


People also ask

Is there a map function in NumPy?

vectorize() method. The numpy. vectorize() function maps functions on data structures that contain a sequence of objects like NumPy arrays. The nested sequence of objects or NumPy arrays as inputs and returns a single NumPy array or a tuple of NumPy arrays.

How do I find the closest element to a value in Python?

We can find the nearest value in the list by using the min() function. Define a function that calculates the difference between a value in the list and the given value and returns the absolute value of the result. Then call the min() function which returns the closest value to the given value.

What is NP Asarray?

asarray() function is used when we want to convert input to an array. Input can be lists, lists of tuples, tuples, tuples of tuples, tuples of lists and arrays. Syntax : numpy.asarray(arr, dtype=None, order=None)


1 Answers

This is one way using just numpy.linalg.norm.

import pandas as pd, numpy as np

df1['coords1'] = list(zip(df1['latitude1'], df1['longitude1']))
df2['coords2'] = list(zip(df2['latitude2'], df2['longitude2']))

def calc_min(x):
    amin = np.argmin([np.linalg.norm(np.array(x)-np.array(y)) for y in df2['coords2']])
    return df2['coords2'].iloc[amin]

df1['closest'] = df1['coords1'].map(calc_min)

#    latitude1  longitude1                  coords1              closest
# 0  40.748043  -73.992953  (40.748043, -73.992953)  (42.50779, 1.52109)
# 1  42.361016  -71.020005  (42.361016, -71.020005)  (42.50779, 1.52109)
# 2  25.361016   54.000000        (25.361016, 54.0)  (25.0657, 55.17128)

Setup

from io import StringIO

mystr1 = """latitude1|longitude1
40.748043|-73.992953
42.361016|-71.020005
25.361016|54.0000
"""

mystr2 = """latitude2|longitude2
42.50729|1.53414
42.50779|1.52109
25.56473|55.55517
25.78953|55.94320
25.33132|56.34199
25.06570|55.17128
25.59246|56.26176
25.61955|56.27291
25.33737|55.41206
24.11028|52.73056"""

df1 = pd.read_csv(StringIO(mystr1), sep='|')
df2 = pd.read_csv(StringIO(mystr2), sep='|')

If performance is an issue, you can vectorize this calculation fairly easily via the underlying numpy arrays.

like image 81
jpp Avatar answered Sep 28 '22 04:09

jpp