I have a .csv file that contains city, latitude and longitude data in the below format:
CITY|LATITUDE|LONGITUDE
A|40.745392|-73.978364
B|42.562786|-114.460503
C|37.227928|-77.401924
D|41.245708|-75.881241
E|41.308273|-72.927887
I need to create a distance matrix in the below format (please ignore the dummy values):
         A         B         C         D         E   
A  0.000000  6.000000  5.744563  6.082763  5.656854  
B  6.000000  0.000000  6.082763  5.385165  5.477226  
C  1.744563  6.082763  0.000000  6.000000  5.385165
D  6.082763  5.385165  6.000000  0.000000  5.385165  
E  5.656854  5.477226  5.385165  5.385165  0.000000  
I have loaded the data into a pandas dataframe and have created a cross join as below:
import pandas as pd
df_A = pd.read_csv('lat_lon.csv', delimiter='|', encoding="utf-8-sig")
df_B = df_A
df_A['key'] = 1
df_B['key'] = 1 
df_C = pd.merge(df_A, df_B, on='key')  
The Euclidean distance is simply the square root of the squared differences between corresponding elements of the rows (or columns). This is probably the most commonly used distance metric.
You can use pdist and squareform methods from scipy.spatial.distance:
In [12]: df
Out[12]:
  CITY   LATITUDE   LONGITUDE
0    A  40.745392  -73.978364
1    B  42.562786 -114.460503
2    C  37.227928  -77.401924
3    D  41.245708  -75.881241
4    E  41.308273  -72.927887
In [13]: from scipy.spatial.distance import squareform, pdist
In [14]: pd.DataFrame(squareform(pdist(df.iloc[:, 1:])), columns=df.CITY.unique(), index=df.CITY.unique())
Out[14]:
           A          B          C          D          E
A   0.000000  40.522913   4.908494   1.967551   1.191779
B  40.522913   0.000000  37.440606  38.601738  41.551558
C   4.908494  37.440606   0.000000   4.295932   6.055264
D   1.967551  38.601738   4.295932   0.000000   2.954017
E   1.191779  41.551558   6.055264   2.954017   0.000000
                        for i in df["CITY"]:
    for j in df["CITY"]:
        row = df[df["CITY"] == j][["LATITUDE", "LONGITUDE"]]
        latitude = row["LATITUDE"].tolist()[0]
        longitude = row["LONGITUDE"].tolist()[0]
        df.loc[df['CITY'] == i, j] = ((df["LATITUDE"] - latitude)**2 + (df["LONGITUDE"] - longitude)**2)**0.5
df = df.drop(["CITY", "LATITUDE", "LONGITUDE"], axis=1)
This works
the matrix can be directly created with cdist in scipy.spatial.distance:
from scipy.spatial.distance import cdist
df_array = df[["LATITUDE", "LONGITUDE"]].to_numpy()
dist_mat = cdist(df_array, df_array)
pd.DataFrame(dist_mat, columns = df["CITY"], index = df["CITY"])
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With