Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Double for-loop operation in R (with an example)

Please look at the following small working example:

#### Pseudo data
nobs1 <- 4000
nobs2 <- 5000
mylon1 <- runif(nobs1, min=0, max=1)-76
mylat1 <- runif(nobs1, min=0, max=1)+37
mylon2 <- runif(nobs2, min=0, max=1)-76
mylat2 <- runif(nobs2, min=0, max=1)+37

#### define a distance function
thedistance <- function(lon1, lat1, lon2, lat2) {
 R <- 6371 # Earth mean radius [km]
 delta.lon <- (lon2 - lon1)
 delta.lat <- (lat2 - lat1)
 a <- sin(delta.lat/2)^2 + cos(lat1) * cos(lat2) * sin(delta.lon/2)^2
 c <- 2 * asin(min(1,sqrt(a)))
 d = R * c
 return(d)
}

ptm <- proc.time()

#### Calculate distances between locations
# Initiate the resulting distance vector
ndistance <- nobs1*nobs2 # The number of distances
mydistance <- vector(mode = "numeric", length = ndistance)

k=1
for (i in 1:nobs1) {
 for (j in 1:nobs2) {
  mydistance[k] = thedistance(mylon1[i],mylat1[i],mylon2[j],mylat2[j])
  k=k+1
 }
}

proc.time() - ptm

The computation time:

  user  system elapsed 
249.85    0.16  251.18

Here, my question is whether there is still room for speeding up the double for-loop calculation. Thank you very much.

like image 510
Bill TP Avatar asked Dec 06 '14 07:12

Bill TP


2 Answers

Here's an option that decreases the runtime to ~2 seconds on my machine because part of it is vectorized.

A direct comparison with the original solution follows.

Test data:

nobs1 <- 4000
nobs2 <- 5000
mylon1 <- runif(nobs1, min=0, max=1)-76
mylat1 <- runif(nobs1, min=0, max=1)+37
mylon2 <- runif(nobs2, min=0, max=1)-76
mylat2 <- runif(nobs2, min=0, max=1)+37

Original solution:

#### define a distance function
thedistance <- function(lon1, lat1, lon2, lat2) {
  R <- 6371 # Earth mean radius [km]
  delta.lon <- (lon2 - lon1)
  delta.lat <- (lat2 - lat1)
  a <- sin(delta.lat/2)^2 + cos(lat1) * cos(lat2) * sin(delta.lon/2)^2
  c <- 2 * asin(min(1,sqrt(a)))
  d = R * c
  return(d)
}

ptm <- proc.time()

#### Calculate distances between locations
# Initiate the resulting distance vector
ndistance <- nobs1*nobs2 # The number of distances
mydistance <- vector(mode = "numeric", length = ndistance)

k=1
for (i in 1:nobs1) {
  for (j in 1:nobs2) {
    mydistance[k] = thedistance(mylon1[i],mylat1[i],mylon2[j],mylat2[j])
    k=k+1
  }
}

proc.time() - ptm
   User      System     elapsed 
148.243       0.681     148.901 

My approach:

# modified (vectorized) distance function:
thedistance2 <- function(lon1, lat1, lon2, lat2) {
  R <- 6371 # Earth mean radius [km]
  delta.lon <- (lon2 - lon1)
  delta.lat <- (lat2 - lat1)
  a <- sin(delta.lat/2)^2 + cos(lat1) * cos(lat2) * sin(delta.lon/2)^2
  c <- 2 * asin(pmin(1,sqrt(a)))   # pmin instead of min
  d = R * c
  return(d)
}

ptm2 <- proc.time()

lst <- vector("list", length = nobs1)

for (i in seq_len(nobs1)) {
    lst[[i]] = thedistance2(mylon1[i],mylat1[i],mylon2,mylat2)
}

res <- unlist(lst)

proc.time() - ptm2
   User      System     elapsed
  1.988       0.331       2.319 

Are the results all equal?

all.equal(mydistance, res)
#[1] TRUE
like image 197
talat Avatar answered Nov 14 '22 06:11

talat


Here is an other approach using Rcpp. On my machine it was slightly faster than beginneR's very nice vectorized version.

library(Rcpp)

nobs1 <- 4000
nobs2 <- 5000
mylon1 <- runif(nobs1, min=0, max=1)-76
mylat1 <- runif(nobs1, min=0, max=1)+37
mylon2 <- runif(nobs2, min=0, max=1)-76
mylat2 <- runif(nobs2, min=0, max=1)+37

sourceCpp("dist.cpp")

system.time({
  lst <- vector("list", length = nobs1)
  for (i in seq_len(nobs1)) {
    lst[[i]] = thedistance2(mylon1[i],mylat1[i],mylon2,mylat2)
  }
  res <- unlist(lst)
})

##  user  system elapsed 
## 4.636   0.084   4.737 


system.time(res2 <- dist_vec(mylon1, mylat1, mylon2, mylat2))
##  user  system elapsed 
## 2.584   0.044   2.633 

all.equal(res, res2)
## TRUE 

And the file dist.cpp countains

#include <Rcpp.h>
#include <iostream>
#include <math.h>

using namespace Rcpp;

double dist_cpp(double lon1, double lat1,
                double lon2, double lat2){
  int R = 6371; 
  double delta_lon = (lon2 - lon1);
  double delta_lat = (lat2 - lat1);
  double a = pow(sin(delta_lat/2.0), 2) + cos(lat1) * cos(lat2) * pow(sin(delta_lon/2.0), 2);
  double c;
  a = sqrt(a);
  if (a < 1.0) {
    c = 2 * asin(a);
  } else {
    c = 2 * asin(1);
  }
  double d = R * c;
  return(d);

}

// [[Rcpp::export]]
NumericVector dist_vec(NumericVector lon1, NumericVector lat1, 
                       NumericVector lon2, NumericVector lat2) {
  int n = lon1.size();
  int m = lon2.size();
  int k = n * m;
  NumericVector res(k);
  int c = 0;
  for (int i = 0; i < n; i++) {
    for (int j = 0; j < m; j++) {
      res[c] = dist_cpp(lon1[i], lat1[i], lon2[j], lat2[j]);
      c++;
    }
  }
  return(res);
}
like image 34
johannes Avatar answered Nov 14 '22 05:11

johannes