I am running the following for loop for the gwr.basic function in the GWmodel package in R. What I need to do is to collect the mean of estimate parameter for any given bandwidth.
the code looks like:
library(GWmodel)
data("DubVoter")
#Dub.voter
LARentMean = list()
for (i in 20:21)
{
gwr.res <- gwr.basic(GenEl2004 ~ DiffAdd + LARent + SC1 + Unempl + LowEduc + Age18_24 + Age25_44 + Age45_64, data = Dub.voter, bw = i, kernel = "bisquare", adaptive = TRUE, F123.test = TRUE)
a <- mean(gwr.res$SDF$LARent)
LARentMean[i] <- a
}
outcome = unlist(LARentMean)
> outcome
[1] -0.1117668 -0.1099969
However it is terribly slow at returning the result. I need a much wider range such as 20:200. Is there a way to speed the process up? If not, how to have a stepped range let's say 20 to 200 with steps of 5 to reduce the number of operations?
I am a python user new to R. I read on SO that R is well known for being slow at for loops and that there are more efficient alternatives. More clarity on this point would be welcomed.
I got the same impression like @musically_ut. The for loop and the traditional for-vs.apply
debate is unlikely to help you here. Try to go for parallelization if you got more than one core. There are several packages like parallel
or snowfall
. Which package is ultimately the best and fastest depends on your machine and operating system.
Best does not always equal fastest here. A code that works cross-platform and can be worth more than a bit of extra performance. Also transparency and ease of use can outweigh maximum speed. That being said I like the standard solution a lot and would recommend to use parallel
which ships with R and works on Windows, OSX and Linux.
EDIT: here's the fully reproducible example using the OP's example.
library(GWmodel)
data("DubVoter")
library(parallel)
bwlist <- list(bw1 = 20, bw2 = 21)
cl <- makeCluster(detectCores())
# load 'GWmodel' for each node
clusterEvalQ(cl, library(GWmodel))
# export data to each node
clusterExport(cl, varlist = c("bwlist","Dub.voter"))
out <- parLapply(cl, bwlist, function(e){
try(gwr.basic(GenEl2004 ~ DiffAdd + LARent + SC1 +
Unempl + LowEduc + Age18_24 + Age25_44 +
Age45_64, data = Dub.voter,
bw = e, kernel = "bisquare",
adaptive = TRUE, F123.test = TRUE ))
} )
LArent_l <- lapply(lapply(out,"[[","SDF"),"[[","LARent")
unlist(lapply(LArent_l,"mean"))
# finally, stop the cluster
stopCluster(cl)
Besides using parallelization as Matt Bannert suggests, you should preallocate the vector LARentMean
. Often, it's not the for
loop itself that is slow but the fact that the for
seduces you to do slow things like creating growing vectors.
Consider the following example to see the impact of a growing vector as compared to preallocating the memory:
library(microbenchmark)
growing <- function(x) {
mylist <- list()
for (i in 1:x) {
mylist[[i]] <- i
}
}
allocate <- function(x) {
mylist <- vector(mode = "list", length = x)
for (i in 1:x) {
mylist[[i]] <- i
}
}
microbenchmark(growing(1000), allocate(1000), times = 1000)
# Unit: microseconds
# expr min lq mean median uq max neval
# growing(1000) 3055.134 4284.202 4743.4874 4433.024 4655.616 47977.236 1000
# allocate(1000) 867.703 917.738 998.0719 956.441 995.143 2564.192 1000
The growing list is about 5 times slower than the version that preallocates the memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With