Euclidean Distance Matrix Using Pandas

Tags:

I have a .csv file that contains city, latitude and longitude data in the below format:

CITY|LATITUDE|LONGITUDE
A|40.745392|-73.978364
B|42.562786|-114.460503
C|37.227928|-77.401924
D|41.245708|-75.881241
E|41.308273|-72.927887

I need to create a distance matrix in the below format (please ignore the dummy values):

         A         B         C         D         E   
A  0.000000  6.000000  5.744563  6.082763  5.656854  
B  6.000000  0.000000  6.082763  5.385165  5.477226  
C  1.744563  6.082763  0.000000  6.000000  5.385165
D  6.082763  5.385165  6.000000  0.000000  5.385165  
E  5.656854  5.477226  5.385165  5.385165  0.000000

I have loaded the data into a pandas dataframe and have created a cross join as below:

import pandas as pd
df_A = pd.read_csv('lat_lon.csv', delimiter='|', encoding="utf-8-sig")
df_B = df_A
df_A['key'] = 1
df_B['key'] = 1 
df_C = pd.merge(df_A, df_B, on='key')

Can you please help me create the above matrix structure?
Also, is it possible to avoid step involving cross join?

272

asked Aug 29 '16 10:08

Abacus

3 Answers

You can use pdist and squareform methods from scipy.spatial.distance:

In [12]: df
Out[12]:
  CITY   LATITUDE   LONGITUDE
0    A  40.745392  -73.978364
1    B  42.562786 -114.460503
2    C  37.227928  -77.401924
3    D  41.245708  -75.881241
4    E  41.308273  -72.927887

In [13]: from scipy.spatial.distance import squareform, pdist

In [14]: pd.DataFrame(squareform(pdist(df.iloc[:, 1:])), columns=df.CITY.unique(), index=df.CITY.unique())
Out[14]:
           A          B          C          D          E
A   0.000000  40.522913   4.908494   1.967551   1.191779
B  40.522913   0.000000  37.440606  38.601738  41.551558
C   4.908494  37.440606   0.000000   4.295932   6.055264
D   1.967551  38.601738   4.295932   0.000000   2.954017
E   1.191779  41.551558   6.055264   2.954017   0.000000

155

answered Oct 01 '22 13:10

MaxU - stop WAR against UA

for i in df["CITY"]:
    for j in df["CITY"]:
        row = df[df["CITY"] == j][["LATITUDE", "LONGITUDE"]]
        latitude = row["LATITUDE"].tolist()[0]
        longitude = row["LONGITUDE"].tolist()[0]
        df.loc[df['CITY'] == i, j] = ((df["LATITUDE"] - latitude)**2 + (df["LONGITUDE"] - longitude)**2)**0.5

df = df.drop(["CITY", "LATITUDE", "LONGITUDE"], axis=1)

This works

answered Oct 01 '22 13:10

Himaprasoon

the matrix can be directly created with cdist in scipy.spatial.distance:

from scipy.spatial.distance import cdist
df_array = df[["LATITUDE", "LONGITUDE"]].to_numpy()
dist_mat = cdist(df_array, df_array)
pd.DataFrame(dist_mat, columns = df["CITY"], index = df["CITY"])

answered Oct 01 '22 11:10

simplyPTA

Related questions
                            
                                Streaming a generated CSV with Flask
                            
                                Python 2.7 exception handling syntax
                            
                                concatenate numpy string array along an axis?
                            
                                Why does Django do cascading deletes on foreign keys?
                            
                                how does theano.scan's updates work?
                            
                                PyMySQL and OrderedDict
                            
                                What's the difference between [] and [[]] in pandas?
                            
                                Plotting a heat map from three lists: X, Y, Intensity
                            
                                How to get travis to fail if tests do not have enough coverage for python
                            
                                Pandas sum over duplicated indices with sum
                            
                                Does alembic care what its migration files are called?
                            
                                How to merge two pandas dataframe in parallel (multithreading or multiprocessing)
                            
                                Error installing Numba on OS X
                            
                                python import module from a package
                            
                                How to perfectly convert one-element list to tuple in Python?
                            
                                what is difference between [None] and [] in python? [duplicate]
                            
                                Return tuple with smallest y value from list of tuples
                            
                                How to get a JSON response from a Google Chrome Selenium Webdriver client?
                            
                                Python how to sort list with float values [duplicate]
                            
                                Get package version for conda meta.yaml from source file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Euclidean Distance Matrix Using Pandas

Tags:

python

pandas

dataframe