I am currently reading in data into a dataframe that looks like this.
City XCord YCord Boston 5 2 Phoenix 7 3 New York 8 1 ..... . .
I want to to create a Euclidean Distance Matrix from this data showing the distance between all city pairs so I get a resulting matrix like:
Boston Phoenix New York Boston 0 2.236 3.162 Phoenix 2.236 0 2.236 New York 3.162 2.236 0
There are many more cities and coordinates in my actual data frame so i need to to be able to somehow iterate over all of the city pairs and create a distance matrix like the one I have shown above but I am not sure how to pair all of the cites together and apply the Euclidean Distance formula? Any help would be appreciated.
distance_matrix(x, y, p=2) Parameters: x : (M, K) Matrix of M vectors, each of dimension K. y : (N, K) Matrix of N vectors, each of dimension K. p : float, 1 <= p <= infinity, defines which Minkowski p-norm to use. Returns: (M, N) ndarray / matrix containing the distance from every vector in x to every vector in y.
A distance matrix is a table that shows the distance between pairs of objects. For example, in the table below we can see a distance of 16 between A and B, of 47 between A and C, and so on. By definition, an object's distance from itself, which is shown in the main diagonal of the table, is 0.
In mathematics, computer science and especially graph theory, a distance matrix is a square matrix (two-dimensional array) containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the distance being used to define this matrix may or may not be a metric.
I think you are intrested in distance_matrix.
For example:
Create data:
import pandas as pd from scipy.spatial import distance_matrix data = [[5, 7], [7, 3], [8, 1]] ctys = ['Boston', 'Phoenix', 'New York'] df = pd.DataFrame(data, columns=['xcord', 'ycord'], index=ctys)
Output:
xcord ycord Boston 5 7 Phoenix 7 3 New York 8 1
Using the distance matrix function:
pd.DataFrame(distance_matrix(df.values, df.values), index=df.index, columns=df.index)
Results:
Boston Phoenix New York Boston 0.000000 4.472136 6.708204 Phoenix 4.472136 0.000000 2.236068 New York 6.708204 2.236068 0.000000
if you don't want to use scipy you can exploit list comprehension in this way:
dist = lambda p1, p2: sqrt(((p1-p2)**2).sum()) dm = np.asarray([[dist(p1, p2) for p2 in xy_list] for p1 in xy_list])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With