Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matrix of distances with Geosphere: avoid repeat calculus

I want to compute the distance among all points in a very large matrix using distm from geosphere.

See a minimal example:

library(geosphere)
library(data.table)

coords <- data.table(coordX=c(1,2,5,9), coordY=c(2,2,0,1))
distances <- distm(coords, coords, fun = distGeo)

The issue is that due to the nature of the distances I am computing, distm gives me back a symmetric matrix, therefore, I could avoid to calculate more than half of the distances:

structure(c(0, 111252.129800202, 497091.059564718, 897081.91986428, 
111252.129800202, 0, 400487.621661164, 786770.053508848, 497091.059564718, 
400487.621661164, 0, 458780.072878927, 897081.91986428, 786770.053508848, 
458780.072878927, 0), .Dim = c(4L, 4L))

May you help me to find a more efficient way to compute all those distances avoiding doing twice each one?

like image 612
LocoGris Avatar asked Sep 10 '25 16:09

LocoGris


2 Answers

If you want to compute all pairwise distances for points x, it is better to use distm(x) rather than distm(x,x). The distm function returns the same symmetric matrix in both cases but when you pass it a single argument it knows that the matrix is symmetric, so it won't do unnecessary computations.

You can time it.

library("geosphere")

n <- 500
xy <- matrix(runif(n*2, -90, 90), n, 2)

system.time( replicate(100, distm(xy, xy) ) )
#  user  system elapsed 
# 61.44    0.23   62.79 
system.time( replicate(100, distm(xy) ) )
#  user  system elapsed 
# 36.27    0.39   38.05 

You can also look at the R code for geosphere::distm to check that it treats the two cases differently.

Aside: Quick google search finds parallelDist: Parallel Distance Matrix Computation on CRAN. The geodesic distance is an option.

like image 153
dipetkov Avatar answered Sep 13 '25 05:09

dipetkov


You can prepare a data frame of possible combinations without repetitions (with gtools packages). Then to compute distances for those pairs. Here is the code:

library(gtools)
library(geosphere)
library(data.table)

coords <- data.table(coordX = c(1, 2, 5, 9), coordY = c(2, 2, 0, 1))
pairs <- combinations(n = nrow(coords), r = 2, repeats.allowed = F, v = c(1:nrow(coords)))

distances <- apply(pairs, 1, function(x) {
    distm(coords[x[1], ], coords[x[2], ], fun = distGeo)
})

# Construct distances matrix
dist_mat <- matrix(NA, nrow = nrow(coords), ncol = nrow(coords))
dist_mat[upper.tri(dist_mat)] <- distances
dist_mat[lower.tri(dist_mat)] <- distances
dist_mat[is.na(dist_mat)] <- 0

print(dist_mat)

The results:

         [,1]     [,2]     [,3]     [,4]
[1,]      0.0 111252.1 497091.1 400487.6
[2,] 111252.1      0.0 897081.9 786770.1
[3,] 497091.1 400487.6      0.0 458780.1
[4,] 897081.9 786770.1 458780.1      0.0
like image 32
Istrel Avatar answered Sep 13 '25 07:09

Istrel