I have a .csv file that contains city, latitude and longitude data in the below format:
CITY|LATITUDE|LONGITUDE
A|40.745392|-73.978364
B|42.562786|-114.460503
C|37.227928|-77.401924
D|41.245708|-75.881241
E|41.308273|-72.927887
I need to create a distance matrix in the below format (please ignore the dummy values):
A B C D E
A 0.000000 6.000000 5.744563 6.082763 5.656854
B 6.000000 0.000000 6.082763 5.385165 5.477226
C 1.744563 6.082763 0.000000 6.000000 5.385165
D 6.082763 5.385165 6.000000 0.000000 5.385165
E 5.656854 5.477226 5.385165 5.385165 0.000000
I have loaded the data into a pandas dataframe and have created a cross join as below:
import pandas as pd
df_A = pd.read_csv('lat_lon.csv', delimiter='|', encoding="utf-8-sig")
df_B = df_A
df_A['key'] = 1
df_B['key'] = 1
df_C = pd.merge(df_A, df_B, on='key')
The Euclidean distance is simply the square root of the squared differences between corresponding elements of the rows (or columns). This is probably the most commonly used distance metric.
You can use pdist and squareform methods from scipy.spatial.distance:
In [12]: df
Out[12]:
CITY LATITUDE LONGITUDE
0 A 40.745392 -73.978364
1 B 42.562786 -114.460503
2 C 37.227928 -77.401924
3 D 41.245708 -75.881241
4 E 41.308273 -72.927887
In [13]: from scipy.spatial.distance import squareform, pdist
In [14]: pd.DataFrame(squareform(pdist(df.iloc[:, 1:])), columns=df.CITY.unique(), index=df.CITY.unique())
Out[14]:
A B C D E
A 0.000000 40.522913 4.908494 1.967551 1.191779
B 40.522913 0.000000 37.440606 38.601738 41.551558
C 4.908494 37.440606 0.000000 4.295932 6.055264
D 1.967551 38.601738 4.295932 0.000000 2.954017
E 1.191779 41.551558 6.055264 2.954017 0.000000
for i in df["CITY"]:
for j in df["CITY"]:
row = df[df["CITY"] == j][["LATITUDE", "LONGITUDE"]]
latitude = row["LATITUDE"].tolist()[0]
longitude = row["LONGITUDE"].tolist()[0]
df.loc[df['CITY'] == i, j] = ((df["LATITUDE"] - latitude)**2 + (df["LONGITUDE"] - longitude)**2)**0.5
df = df.drop(["CITY", "LATITUDE", "LONGITUDE"], axis=1)
This works
the matrix can be directly created with cdist
in scipy.spatial.distance
:
from scipy.spatial.distance import cdist
df_array = df[["LATITUDE", "LONGITUDE"]].to_numpy()
dist_mat = cdist(df_array, df_array)
pd.DataFrame(dist_mat, columns = df["CITY"], index = df["CITY"])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With