Xn can take values of -1 or 1 each with a probability of 0.5. And Sn= Sn-1 + Xn How can I compute the partial sum observed at time n given by Sn = X1 + X2 + : : : + Xn. I'm trying to simulate a random walk here. I did the following but I'm not exactly sure it's right:
rw <- function(n){
x=numeric(n)
xdir=c(TRUE, FALSE)
step=c(1,-1)
for (i in 2:n)
if (sample(xdir,1)) {
x[i]=x[i-1]+sample(step,1)
} else {
x[i]=x[i-1]
}
list(x=x)
}
Please Help!
You can also do this really concisely and efficiently with cumsum
set.seed(1)
n <- 1000
x <- cumsum(sample(c(-1, 1), n, TRUE))
This answer is just to explain why your code did not work. @jake-burkhead gave the way you should actually write the code.
In this code, you only make a step half of the time. This is because you are sampling from xdir
to decide if you move or not. Instead, I would recommend you the following inside your loop:
for(i in 2:n){
x[i] <- x[i - 1] + sample(step, 1)
}
The sample(step, 1)
call decides if the walk moves 1
or -1
.
To compute the partial sums, you can use cumsum()
after you generate x
. The result will be a vector of the partial sums at a given point in the walk.
This post addresses timings of various base R methods for this calculation. This post is inspired by comments to this post and the comment of @josilber in the post to the fastest method posted by Jake Burkhead.
Below, a variety of methods are used to calculate the random walk. To accomplish this, each function pulls 1000 values of either 1 or -1 as defined in fnc
below. The timing test uses microbenchmark
with 1000 replications for each method.
fnc <- function(n) sample(c(1L, -1L), n, replace=TRUE)
library(microbenchmark)
microbenchmark(all=cumsum(fnc(1000L)),
reduce=Reduce("+", fnc(1000L), accumulate=TRUE),
laplyRpCln=cumsum(unlist(lapply(rep.int(1L, 1000L), fnc))),
laplyRpAn=cumsum(unlist(lapply(rep.int(1L, 1000L), function(x) fnc(1L)))),
laplySqAn=cumsum(unlist(lapply(seq_len(1000L), function(x) fnc(1L)))),
saplyRpCln=cumsum(sapply(rep.int(1L, 1000L), fnc)),
saplyRpAn=cumsum(sapply(rep.int(1L, 1000L), function(x) fnc(1L))),
saplySqAn=cumsum(sapply(seq_len(1000L), function(x) fnc(1L))),
vaplyRpCln=cumsum(vapply(rep.int(1L, 1000L), fnc, FUN.VALUE=0)),
vaplyRpAn=cumsum(vapply(rep.int(1L, 1000L), function(x) fnc(1L), FUN.VALUE=0)),
vaplySqAn=cumsum(vapply(seq_len(1000L), function(x) fnc(1L), FUN.VALUE=0)),
replicate=cumsum(replicate(1000L, fnc(1L))),
forPre={vals <- numeric(1000L); for(i in seq_along(vals)) vals[i] <- fnc(1L); cumsum(vals)},
forNoPre={vals <- numeric(0L); for(i in seq_len(1000L)) vals <- c(vals, fnc(1L)); cumsum(vals)},
times=1000)
Here,
cumsum
and pulling the sample all at once.Reduce
to perform the summation.lapply
and unlist
to return a vector and iterates through 1000 instances of 1, calling the function directly by name.seq
rather than rep
.sapply
is called instead of lapply
/unlist
.vapply
is used in place of lapply
/unlist
.replicate
, where the default is simplify=TRUE.for
loop that pre-allocates the vector and the fills it in.for
loop that creates an empty numeric(0)
vector and then uses c
to concatenate to that vector.This returns
Unit: microseconds
expr min lq mean median uq max neval cld
all 25.634 31.0705 85.66495 33.6890 35.3400 49240.30 1000 a
reduce 542.073 646.7720 780.13592 696.4775 750.2025 51685.44 1000 b
laplyRpCln 4349.384 5026.4015 6433.60754 5409.2485 7209.3405 58494.44 1000 c e
laplyRpAn 4600.200 5281.6190 6513.58733 5682.0570 7488.0865 55239.04 1000 c e
laplySqAn 4616.986 5251.4685 6514.09770 5634.9065 7488.1560 54263.04 1000 c e
saplyRpCln 4362.324 5080.3970 6325.66531 5506.5330 7294.6225 59075.02 1000 cd
saplyRpAn 4701.140 5386.1350 6781.95655 5786.6905 7587.8525 55429.02 1000 e
saplySqAn 4651.682 5342.5390 6551.35939 5735.0610 7525.4725 55148.32 1000 c e
vaplyRpCln 4366.322 5046.0625 6270.66501 5482.8565 7208.0680 63756.83 1000 c
vaplyRpAn 4657.256 5347.2190 6724.35226 5818.5225 7580.3695 64513.37 1000 de
vaplySqAn 4623.897 5325.6230 6475.97938 5769.8130 7541.3895 14614.67 1000 c e
replicate 4722.540 5395.1420 6653.90306 5777.3045 7638.8085 59376.89 1000 c e
forPre 5911.107 6823.3040 8172.41411 7226.7820 9038.9550 56119.11 1000 f
forNoPre 8431.855 10584.6535 11401.64190 10910.0480 11267.5605 58566.27 1000 g
Notice that the first method is clearly the fastest. This is followed by pulling the full sample at once and then using Reduce
to perform the summation. Among the *apply
functions, the "clean" versions, using the name of the function directly seem to have a minor performance improvement, and the lapply
version seems to be on par with vapply
, but given the range of values, this conclusion is not entirely straightforward. sapply
appears to be the slowest, though the method of function call dominates the type of *apply
function.
The two for
loops performed the worst, and the pre-allocation for
loop outperforming for
loop growing with c
.
Here, I'm running a patched version of 3.4.1 (patched roughly August 23, 2017) on openSuse 42.1.
Please let me know if you see any mistakes and I'll fix them as soon as I can. Thanks to Ben Bolker for prodding me to investigate the final function more, where I found a couple of bugs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With