Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simulating a timeseries in dplyr instead of using a for loop

Tags:

r

dplyr

tidyverse

So, while lag and lead in dplyr are great, I want to simulate a timeseries of something like population growth. My old school code would look something like:

tdf <- data.frame(time=1:5, pop=50)
for(i in 2:5){
  tdf$pop[i] = 1.1*tdf$pop[i-1]
}

which produces

  time    pop
1    1 50.000
2    2 55.000
3    3 60.500
4    4 66.550
5    5 73.205

I feel like there has to be a dplyr or tidyverse way to do this (as much as I love my for loop).

But, something like

tdf <- data.frame(time=1:5, pop=50) %>%
  mutate(pop = 1.1*lag(pop))

which would have been my first guess just produces

  time pop
1    1  NA
2    2  55
3    3  55
4    4  55
5    5  55

I feel like I'm missing something obvious.... what is it?

Note - this is a trivial example - my real examples use multiple parameters, many of which are time-varying (I'm simulating forecasts under different GCM scenarios), so, the tidyverse is proving to be a powerful tool in bringing my simulations together.

like image 544
jebyrnes Avatar asked Oct 17 '16 20:10

jebyrnes


3 Answers

Reduce (or its purrr variants, if you like) is what you want for cumulative functions that don't already have a cum* version written:

data.frame(time = 1:5, pop = 50) %>%
    mutate(pop = Reduce(function(x, y){x * 1.1}, pop, accumulate = TRUE))

##   time    pop
## 1    1 50.000
## 2    2 55.000
## 3    3 60.500
## 4    4 66.550
## 5    5 73.205

or with purrr,

data.frame(time = 1:5, pop = 50) %>%
    mutate(pop = accumulate(pop, ~.x * 1.1))

##   time    pop
## 1    1 50.000
## 2    2 55.000
## 3    3 60.500
## 4    4 66.550
## 5    5 73.205
like image 74
alistaire Avatar answered Nov 07 '22 15:11

alistaire


If the starting value of pop is, say, 50, then pop = 50 * 1.1^(0:4) will give you the next four values. With your code, you could do:

data.frame(time=1:5, pop=50) %>%
  mutate(pop = pop * 1.1^(1:n() - 1))

Or,

base = 50

data.frame(time=1:5) %>%
  mutate(pop = base * 1.1^(1:n()-1))
like image 5
eipi10 Avatar answered Nov 07 '22 16:11

eipi10


Purrr's accumulate function can handle time-varying indices, if you pass them to your simulation function as a list with all the parameters in it. However, it takes a bit of wrangling to get this working correctly. The trick here is that accumulate() can work on list as well as vector columns. You can use the tidyr function nest() to group columns into a list vector containing the current population state and parameters, then use accumulate() on the resulting list column. This is a bit complicated to explain, so I've included a demo, simulating logistic growth with either a constant growth rate or a time-varying stochastic growth rate. I also included an example of how to use this to simulate multiple replicates for a given model using dpylr+purrr+tidyr.

library(dplyr)
library(purrr)
library(ggplot2)
library(tidyr)

# Declare the population growth function. Note: the first two arguments
# have to be .x (the prior vector of populations and parameters) and .y,
# the current parameter value and population vector. 
# This example function is a Ricker population growth model. 
logistic_growth = function(.x, .y, growth, comp) {
  pop = .x$pop[1]
  growth = .y$growth[1]
  comp  = .y$comp[1]
  # Note: this uses the state from .x, and the parameter values from .y.
  # The first observation will use the first entry in the vector for .x and .y
  new_pop = pop*exp(growth - pop*comp)
  .y$pop[1] = new_pop
  return(.y)
}

# Starting parameters the number of time steps to simulate, initial population size,
# and ecological parameters (growth rate and intraspecific competition rate)
n_steps  = 100
pop_init = 1
growth = 0.5
comp = 0.05

#First test: fixed growth rates
test1 = data_frame(time = 1:n_steps,pop = pop_init, 
                   growth=growth,comp =comp)


# here, the combination of nest() and group_by() split the data into individual 
# time points and then groups all parameters into a new vector called state.
# ungroup() removes the grouping structure, then accumulate runs the function
#on the vector of states. Finally unnest transforms it all back to a
#data frame
out1 = test1 %>%
  group_by(time)%>%
  nest(pop, growth, comp,.key = state)%>%
  ungroup()%>%
  mutate(
    state = accumulate(state,logistic_growth))%>%
  unnest()

# This is the same example, except I drew the growth rates from a normal distribution
# with a mean equal to the mean growth rate and a std. dev. of 0.1
test2 = data_frame(time = 1:n_steps,pop = pop_init, 
                   growth=rnorm(n_steps, growth,0.1),comp=comp)

out2 = test2 %>%
  group_by(time)%>%
  nest(pop, growth, comp,.key = state)%>%
  ungroup()%>%
  mutate(
    state = accumulate(state,logistic_growth))%>%
  unnest()

# This demostrates how to use this approach to simulate replicates using dplyr
# Note the crossing function creates all combinations of its input values
test3 = crossing(rep = 1:10, time = 1:n_steps,pop = pop_init, comp=comp) %>%
  mutate(growth=rnorm(n_steps*10, growth,0.1))

out3 = test3 %>%
  group_by(rep)%>%
  group_by(rep,time)%>%
  nest(pop, growth, comp,.key = state)%>%
  group_by(rep)%>%
  mutate(
    state = accumulate(state,logistic_growth))%>%
  unnest()

print(qplot(time, pop, data=out1)+
        geom_line() +
        geom_point(data= out2, col="red")+
        geom_line(data=out2, col="red")+
        geom_point(data=out3, col="red", alpha=0.1)+
        geom_line(data=out3, col="red", alpha=0.1,aes(group=rep)))
like image 4
Eric Pedersen Avatar answered Nov 07 '22 14:11

Eric Pedersen