Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

vector binding in R

Tags:

r

I would like to implement a simulation program, which requires the following structure:

It has a for loop, the program will generate an vector in each iteration. I need each generated vector is appended to the existing vector.

I do not how how to do this in R. Thanks for the help.

like image 737
bit-question Avatar asked Aug 29 '10 16:08

bit-question


2 Answers

These answers work, but they all require a call to a non-deterministic function like sample() in the loop. This is not loop-invariant code (it is random each time), but it can still be moved out of the for loop. The trick is to use the n argument and generate all the random numbers you need beforehand (if your problem allows this; some may not, but many do). Now you make one call rather than n calls, which matters if your n is large. Here is a quick example random walk (but many problems can be phrased this way). Also, full disclosure: I haven't had any coffee today, so please point out if you see an error :-)

steps <- 30
n <- 100
directions <- c(-1, 1)

results <- vector('list', n)

for (i in seq_len(n)) {
  walk <- numeric(steps)
  for (s in seq_len(steps)) {
    walk[s] <- sample(directions, 1)
  }
  results[[i]] <- sum(walk)
}

We can rewrite this with one call to sample():

all.steps <- sample(directions, n*steps, replace=TRUE)
dim(all.steps) <- c(n, steps)
walks <- apply(all.steps, 1, sum)

Proof of speed increase (n=10000):

> system.time({
+ for (i in seq_len(n)) {
+   walk <- numeric(steps)
+   for (s in seq_len(steps)) {
+     walk[s] <- sample(directions, 1)
+   }
+   results[[i]] <- sum(walk)
+ }})
   user  system elapsed 
  4.231   0.332   4.758 

> system.time({
+ all.steps <- sample(directions, n*steps, replace=TRUE)
+ dim(all.steps) <- c(n, steps)
+ walks <- apply(all.steps, 1, sum)
+ })
   user  system elapsed 
  0.010   0.001   0.012 

If your simulation needs just one random variable per simulation function call, use sapply(), or better yet the multicore package's mclapply(). Revolution Analytics's foreach package may be of use here too. Also, JD Long has a great presentation and post about simulating stuff in R on Hadoop via Amazon's EMR here (I can't find the video, but I'm sure someone will know).

Take home points:

  • Preallocate with numeric(n) or vector('list', n)
  • Push invariant code out of for loops. Cleverly push stochastic functions out of code with their n argument.
  • Try hard for sapply() or lapply(), or better yet mclapply.
  • Don't use x <- c(x, rnorm(100)). Every time you do this, a member of R-core kills a puppy.
like image 129
Vince Avatar answered Nov 07 '22 23:11

Vince


Probably the best thing you can do is preallocate a list of length n (n is number of iterations) and flatten out the list after you're done.

n <- 10
start <- vector("list", n)
for (i in 1:n) {
  a[[i]] <- sample(10)
}
start <- unlist(start)

You could do it the old nasty way. This may be slow for larger vectors.

start <- c()
for (i in 1:n) {
  add <- sample(10)
  start <- c(start, add)
}
like image 42
Roman Luštrik Avatar answered Nov 07 '22 21:11

Roman Luštrik