I want to compute the distance among all points in a very large matrix using distm
from geosphere
.
See a minimal example:
library(geosphere)
library(data.table)
coords <- data.table(coordX=c(1,2,5,9), coordY=c(2,2,0,1))
distances <- distm(coords, coords, fun = distGeo)
The issue is that due to the nature of the distances I am computing, distm
gives me back a symmetric matrix, therefore, I could avoid to calculate more than half of the distances:
structure(c(0, 111252.129800202, 497091.059564718, 897081.91986428,
111252.129800202, 0, 400487.621661164, 786770.053508848, 497091.059564718,
400487.621661164, 0, 458780.072878927, 897081.91986428, 786770.053508848,
458780.072878927, 0), .Dim = c(4L, 4L))
May you help me to find a more efficient way to compute all those distances avoiding doing twice each one?
If you want to compute all pairwise distances for points x
, it is better to use distm(x)
rather than distm(x,x)
. The distm
function returns the same symmetric matrix in both cases but when you pass it a single argument it knows that the matrix is symmetric, so it won't do unnecessary computations.
You can time it.
library("geosphere")
n <- 500
xy <- matrix(runif(n*2, -90, 90), n, 2)
system.time( replicate(100, distm(xy, xy) ) )
# user system elapsed
# 61.44 0.23 62.79
system.time( replicate(100, distm(xy) ) )
# user system elapsed
# 36.27 0.39 38.05
You can also look at the R code for geosphere::distm
to check that it treats the two cases differently.
Aside: Quick google search finds parallelDist
: Parallel Distance Matrix Computation on CRAN. The geodesic distance is an option.
You can prepare a data frame of possible combinations without repetitions (with gtools
packages). Then to compute distances for those pairs. Here is the code:
library(gtools)
library(geosphere)
library(data.table)
coords <- data.table(coordX = c(1, 2, 5, 9), coordY = c(2, 2, 0, 1))
pairs <- combinations(n = nrow(coords), r = 2, repeats.allowed = F, v = c(1:nrow(coords)))
distances <- apply(pairs, 1, function(x) {
distm(coords[x[1], ], coords[x[2], ], fun = distGeo)
})
# Construct distances matrix
dist_mat <- matrix(NA, nrow = nrow(coords), ncol = nrow(coords))
dist_mat[upper.tri(dist_mat)] <- distances
dist_mat[lower.tri(dist_mat)] <- distances
dist_mat[is.na(dist_mat)] <- 0
print(dist_mat)
The results:
[,1] [,2] [,3] [,4]
[1,] 0.0 111252.1 497091.1 400487.6
[2,] 111252.1 0.0 897081.9 786770.1
[3,] 497091.1 400487.6 0.0 458780.1
[4,] 897081.9 786770.1 458780.1 0.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With