Creating a Distance Matrix?

Q: What does a distance matrix look like?

A distance matrix is a table that shows the distance between pairs of objects. For example, in the table below we can see a distance of 16 between A and B, of 47 between A and C, and so on. By definition, an object's distance from itself, which is shown in the main diagonal of the table, is 0.

Q: What do you mean by distance matrix?

In mathematics, computer science and especially graph theory, a distance matrix is a square matrix (two-dimensional array) containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the distance being used to define this matrix may or may not be a metric.

Tags:

python

dataframe

numpy

I am currently reading in data into a dataframe that looks like this.

City         XCord    YCord    Boston         5        2 Phoenix        7        3 New York       8        1 .....          .        .

I want to to create a Euclidean Distance Matrix from this data showing the distance between all city pairs so I get a resulting matrix like:

             Boston    Phoenix   New York Boston         0        2.236      3.162 Phoenix        2.236      0        2.236 New York       3.162    2.236        0

There are many more cities and coordinates in my actual data frame so i need to to be able to somehow iterate over all of the city pairs and create a distance matrix like the one I have shown above but I am not sure how to pair all of the cites together and apply the Euclidean Distance formula? Any help would be appreciated.

561

asked Apr 06 '15 23:04

Jeremy

2 Answers

I think you are intrested in distance_matrix.

For example:

Create data:

import pandas as pd from scipy.spatial import distance_matrix  data = [[5, 7], [7, 3], [8, 1]] ctys = ['Boston', 'Phoenix', 'New York'] df = pd.DataFrame(data, columns=['xcord', 'ycord'], index=ctys)

Output:

          xcord ycord Boston      5   7 Phoenix     7   3 New York    8   1

Using the distance matrix function:

 pd.DataFrame(distance_matrix(df.values, df.values), index=df.index, columns=df.index)

Results:

          Boston    Phoenix     New York Boston    0.000000  4.472136    6.708204 Phoenix   4.472136  0.000000    2.236068 New York  6.708204  2.236068    0.000000

answered Sep 28 '22 10:09

Andrew

if you don't want to use scipy you can exploit list comprehension in this way:

dist = lambda p1, p2: sqrt(((p1-p2)**2).sum()) dm = np.asarray([[dist(p1, p2) for p2 in xy_list] for p1 in xy_list])

answered Sep 28 '22 11:09

francesco lc

Related questions
                            
                                python, writing Json to file [duplicate]
                            
                                How do I pass a PK or slug to a DetailView using RequestFactory in Django?
                            
                                scikit-learn - ROC curve with confidence intervals
                            
                                NumPy append vs concatenate
                            
                                How to set weights in Keras with a numpy array?
                            
                                In python, how to tweak Black formatter, if possible?
                            
                                python libraries for ssh handling
                            
                                Combine duplicated columns within a DataFrame
                            
                                Extract time from datetime and determine if time (not date) falls within range?
                            
                                How do I automatically fix an invalid JSON string?
                            
                                Permission denied error while writing to a file in Python
                            
                                Get the file path for a static file in django code
                            
                                Passing a argument to a callback function
                            
                                How are Django channels different than celery?
                            
                                "RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 3 3, but got 3-dimensional input of size [3, 224, 224] instead"?
                            
                                How do I add a link from the Django admin page of one object to the admin page of a related object?
                            
                                How do I run a python interpreter in Emacs?
                            
                                Do files get closed during an exception exit?
                            
                                Floating point math in different programming languages
                            
                                Edit existing excel workbooks and sheets with xlrd and xlwt

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With