Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a Distance Matrix?

I am currently reading in data into a dataframe that looks like this.

City         XCord    YCord    Boston         5        2 Phoenix        7        3 New York       8        1 .....          .        . 

I want to to create a Euclidean Distance Matrix from this data showing the distance between all city pairs so I get a resulting matrix like:

             Boston    Phoenix   New York Boston         0        2.236      3.162 Phoenix        2.236      0        2.236 New York       3.162    2.236        0 

There are many more cities and coordinates in my actual data frame so i need to to be able to somehow iterate over all of the city pairs and create a distance matrix like the one I have shown above but I am not sure how to pair all of the cites together and apply the Euclidean Distance formula? Any help would be appreciated.

like image 561
Jeremy Avatar asked Apr 06 '15 23:04

Jeremy


People also ask

How do you create a distance matrix in python?

distance_matrix(x, y, p=2) Parameters: x : (M, K) Matrix of M vectors, each of dimension K. y : (N, K) Matrix of N vectors, each of dimension K. p : float, 1 <= p <= infinity, defines which Minkowski p-norm to use. Returns: (M, N) ndarray / matrix containing the distance from every vector in x to every vector in y.

What does a distance matrix look like?

A distance matrix is a table that shows the distance between pairs of objects. For example, in the table below we can see a distance of 16 between A and B, of 47 between A and C, and so on. By definition, an object's distance from itself, which is shown in the main diagonal of the table, is 0.

What do you mean by distance matrix?

In mathematics, computer science and especially graph theory, a distance matrix is a square matrix (two-dimensional array) containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the distance being used to define this matrix may or may not be a metric.


2 Answers

I think you are intrested in distance_matrix.

For example:

Create data:

import pandas as pd from scipy.spatial import distance_matrix  data = [[5, 7], [7, 3], [8, 1]] ctys = ['Boston', 'Phoenix', 'New York'] df = pd.DataFrame(data, columns=['xcord', 'ycord'], index=ctys) 

Output:

          xcord ycord Boston      5   7 Phoenix     7   3 New York    8   1 

Using the distance matrix function:

 pd.DataFrame(distance_matrix(df.values, df.values), index=df.index, columns=df.index) 

Results:

          Boston    Phoenix     New York Boston    0.000000  4.472136    6.708204 Phoenix   4.472136  0.000000    2.236068 New York  6.708204  2.236068    0.000000 
like image 88
Andrew Avatar answered Sep 28 '22 10:09

Andrew


if you don't want to use scipy you can exploit list comprehension in this way:

dist = lambda p1, p2: sqrt(((p1-p2)**2).sum()) dm = np.asarray([[dist(p1, p2) for p2 in xy_list] for p1 in xy_list]) 
like image 32
francesco lc Avatar answered Sep 28 '22 11:09

francesco lc