Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rewriting loops with apply functions

I have the 3 following functions which I would like to make faster, I assume apply functions are the best way to go, but I have never used apply functions, so I have no idea what to do. Any type of hints, ideas and code snippets will be much appreciated.

n, T, dt are global parameters and par is a vector of parameters.

Function 1: is a function to create an m+1,n matrix containing poisson distributed jumps with exponentially distributed jump sizes. My troubles here is because I have 3 loops and I am not sure how to incorporate the if statement in the inner loop. Also I have no idea if it is at all possible to use apply functions on the outer layers of the loops only.

jump <- function(t=0,T=T,par){
  jump <- matrix(0,T/dt+1,n) # initializing output matrix
  U <- replicate(n,runif(100,t,T)) #matrix used to decide when the jumps will happen
  Y <-replicate(n,rexp(100,1/par[6])) #matrix with jump sizes
  for (l in 1:n){ 
    NT <- rpois(1,par[5]*T) #number of jumps
    k=0
    for (j in seq(t,T,dt)){
      k=k+1
      if (NT>0){
        temp=0
        for (i in 1:NT){
          u <- vector("numeric",NT)
          if (U[i,l]>j){ u[i]=0
          }else u[i]=1
          temp=temp+Y[i,l]*u[i]
        }
        jump[k,l]=temp
      }else jump[k,l]=0
    }
  }
  return(jump)
}

Function 2: calculates a default intensity, based on Brownian motions and the jumps from function 1. Here my trouble is how to use apply functions when the variable used for the calculation is the values from the row above in the output matrix AND how to get the right values from the external matrices which are used in the calculations (BMz_C & J)

lambda <- function(t=0,T=T,par,fit=0){
  lambda <- matrix(0,m+1,n) # matrix to hold intesity path output
  lambda[1,] <- par[4] #initializing start value of the intensity path.
  J <- jump(t,T,par) #matrix containing jumps
  for(i in 2:(m+1)){
    dlambda <- par[1]*(par[2]-max(lambda[i-1,],0))*dt+par[3]*sqrt(max(lambda[i-   1,],0))*BMz_C[i,]+(J[i,]-J[i-1,])
    lambda[i,] <- lambda[i-1,]+dlambda
  } 
  return(lambda)
} 

Function 3: calculates a survival probability based on the intensity from function 2. Here a() and B() are functions that return numerical values. My problem here is that the both value i and j are used because i is not always an integer which thus can to be used to reference the external matrix. I have earlier tried to use i/dt, but sometimes it would overwrite one line and skip the next lines in the matrix, most likely due to rounding errors.

S <- function(t=0,T=T,par,plot=0, fit=0){
  S <- matrix(0,(T-t)/dt+1,n)
  if (fit > 0) S.fit <- matrix(0,1,length(mat)) else S.fit <- 0
  l=lambda(t,T,par,fit)
  j=0
  for (i in seq(t,T,dt)){
    j=j+1
    S[j,] <- a(i,T,par)*exp(B(i,T,par)*l[j,])
  }
  return(S)
}

Sorry for the long post, any help for any of the functions will be much appreciated.

EDIT: First of all thanks to digEmAll for the great reply.

I have now worked on vectorising function 2. First I tried

lambda <- function(t=0,T=T,par,fit=0){
  lambda <- matrix(0,m+1,n) # matrix to hold intesity path input
  J <- jump(t,T,par,fit)
    lambda[1,] <- par[4] 
    lambda[2:(m+1),] <- sapply(2:(m+1), function(i){
      lambda[i-1,]+par[1]*(par[2]-max(lambda[i-1,],0))*dt+par[3]*sqrt(max(lambda[i-1,],0))*BMz_C[i,]+(J[i,]-J[i-1,])
    })
  return(lambda)
}

but it would only produce the first column. So I tried a two step apply function.

lambda <- function(t=0,T=T,par,fit=0){
  lambda <- matrix(0,m+1,n) # matrix to hold intesity path input
  J <- jump(t,T,par,fit)
  lambda[1,] <- par[4] 
  lambda[2:(m+1),] <- sapply(1:n, function(l){
    sapply(2:(m+1), function(i){
      lambda[i-1,l]+par[1]*(par[2]-max(lambda[i-1,l],0))*dt+par[3]*sqrt(max(lambda[i-1,l],0))*BMz_C[i,l]+(J[i,l]-J[i-1,l])
    })
  })
  return(lambda)
}

This seems to work, but only on the first row, all rows after that have an identical non-zero value, as if lambda[i-1] is not used in the calculation of lambda[i], does anyone have an idea how to manage that?

like image 307
Marcus Avatar asked Jun 01 '14 14:06

Marcus


1 Answers

I'm going to explain to you, setp-by-step, how to vectorize the first function (one possible way of vectorization, maybe not the best one for your case).
For the others 2 functions, you can simply apply the same concepts and you should be able to do it.

Here, the key concept is: start to vectorize from the innermost loop.


1) First of all, rpois can generate more than one random value at a time but you are calling it n-times asking one random value. So, let's take it out of the loop obtaining this:

jump <- function(t=0,T=T,par){
  jump <- matrix(0,T/dt+1,n) 
  U <- replicate(n,runif(100,t,T)) 
  Y <-replicate(n,rexp(100,1/par[6])) 
  NTs <- rpois(n,par[5]*T) # note the change
  for (l in 1:n){ 
    NT <- NTs[l] # note the change
    k=0
    for (j in seq(t,T,dt)){
      k=k+1
      if (NT>0){
        temp=0
        for (i in 1:NT){
          u <- vector("numeric",NT)
          if (U[i,l]>j){ u[i]=0
          }else u[i]=1
          temp=temp+Y[i,l]*u[i]
        }
        jump[k,l]=temp
      }else jump[k,l]=0
    }
  }
  return(jump)
}

2) Similarly, it is useless/inefficient to call seq(t,T,dt) n-times in the loop since it will always generate the same sequence. So, let's take it out of the loop and store into a vector, obtainig this:

jump <- function(t=0,T=T,par){
  jump <- matrix(0,T/dt+1,n) 
  U <- replicate(n,runif(100,t,T)) 
  Y <-replicate(n,rexp(100,1/par[6])) 
  NTs <- rpois(n,par[5]*T) 
  js <- seq(t,T,dt) # note the change
  for (l in 1:n){ 
    NT <- NTs[l] 
    k=0
    for (j in js){ # note the change
      k=k+1
      if (NT>0){
        temp=0
        for (i in 1:NT){
          u <- vector("numeric",NT)
          if (U[i,l]>j){ u[i]=0
          }else u[i]=1
          temp=temp+Y[i,l]*u[i]
        }
        jump[k,l]=temp
      }else jump[k,l]=0
    }
  }
  return(jump)
}

3) Now, let's have a look at the innermost loop:

for (i in 1:NT){
  u <- vector("numeric",NT)
  if (U[i,l]>j){ u[i]=0
  }else u[i]=1
  temp=temp+Y[i,l]*u[i]
}

this is equal to :

u <- as.integer(U[1:NT,l]<=j)
temp <- sum(Y[1:NT,l]*u)

or, in one-line:

temp <- sum(Y[1:NT,l] * as.integer(U[1:NT,l] <= j))

hence, now the function can be written as :

jump <- function(t=0,T=T,par){
  jump <- matrix(0,T/dt+1,n) 
  U <- replicate(n,runif(100,t,T)) 
  Y <-replicate(n,rexp(100,1/par[6])) 
  NTs <- rpois(n,par[5]*T) 
  js <- seq(t,T,dt) 
  for (l in 1:n){ 
    NT <- NTs[l] 
    k=0
    for (j in js){ 
      k=k+1
      if (NT>0){
        jump[k,l] <- sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) # note the change
      }else jump[k,l]=0
    }
  }
  return(jump)
}

4) Again, let's have a look at the current innermost loop:

for (j in js){ 
  k=k+1
  if (NT>0){
      jump[k,l] <- sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) # note the change
  }else jump[k,l]=0
}

as you can notice, NT does not depend on the iteration of this loop, so the inner if can be moved outside, as follows:

if (NT>0){
  for (j in js){ 
    k=k+1
    jump[k,l] <- sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) # note the change
  }
}else{
  for (j in js){ 
    k=k+1
    jump[k,l]=0
  }
}

this seems worse than before, well yes it is, but now the 2 conditions can be turned into one-liner's (note the use of sapply¹):

if (NT>0){
  jump[1:length(js),l] <- sapply(js,function(j){ sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) })
}else{
  jump[1:length(js),l] <- 0
}

obtaining the following jump function:

jump <- function(t=0,T=T,par){
  jump <- matrix(0,T/dt+1,n) 
  U <- replicate(n,runif(100,t,T)) 
  Y <-replicate(n,rexp(100,1/par[6])) 
  NTs <- rpois(n,par[5]*T) 
  js <- seq(t,T,dt) 
  for (l in 1:n){ 
    NT <- NTs[l] 
    if (NT>0){
      jump[1:length(js),l] <- sapply(js,function(j){ sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) })
    }else{
      jump[1:length(js),l] <- 0
    }
  }
  return(jump)
}

5) finally we can get rid of the last loop, using again the sapply¹ function, obtaining the final jump function :

jump <- function(t=0,T=T,par){
  U <- replicate(n,runif(100,t,T)) 
  Y <-replicate(n,rexp(100,1/par[6])) 
  js <- seq(t,T,dt)
  NTs <- rpois(n,par[5]*T)

  jump <- sapply(1:n,function(l){
    NT <- NTs[l] 
    if (NT>0){
      sapply(js,function(j){ sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) })
    }else {
      rep(0,length(js))
    }
  })
  return(jump)
}

(¹)

sapply function is pretty easy to use. For each element of the list or vector passed in the X parameter, it applies the function passed in the FUN parameter, e.g. :

vect <- 1:3
sapply(X=vect,FUN=function(el){el+10}
# [1] 11 12 13

since by default the simplify parameter is true, the result is coerced to the simplest possible object. So, for example in the previous case the result becomes a vector, while in the following example result become a matrix (since for each element we return a vector of the same size) :

vect <- 1:3
sapply(X=vect,FUN=function(el){rep(el,5)})

#      [,1] [,2] [,3]
# [1,]    1    2    3
# [2,]    1    2    3
# [3,]    1    2    3
# [4,]    1    2    3
# [5,]    1    2    3

Benchmark :

The following benchmark just give you an idea of the speed gain, but the actual performances may be different depending on your input parameters.
As you can imagine, jump_old corresponds to your original function 1, while jump_new is the final vectorized version.

# let's use some random parameters
n = 10
m = 3
T = 13
par = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6)
dt <- 3

set.seed(123)
system.time(for(i in 1:5000) old <- jump_old(T=T,par=par))
# user   system  elapsed 
# 12.39    0.00    12.41

set.seed(123)
system.time(for(i in 1:5000) new <- jump_new(T=T,par=par))
# user  system  elapsed 
# 4.49    0.00     4.53 

# check if last results of the 2 functions are the same:
isTRUE(all.equal(old,new))
# [1] TRUE
like image 170
digEmAll Avatar answered Oct 09 '22 10:10

digEmAll