So, while lag
and lead
in dplyr are great, I want to simulate a timeseries of something like population growth. My old school code would look something like:
tdf <- data.frame(time=1:5, pop=50)
for(i in 2:5){
tdf$pop[i] = 1.1*tdf$pop[i-1]
}
which produces
time pop
1 1 50.000
2 2 55.000
3 3 60.500
4 4 66.550
5 5 73.205
I feel like there has to be a dplyr
or tidyverse
way to do this (as much as I love my for loop).
But, something like
tdf <- data.frame(time=1:5, pop=50) %>%
mutate(pop = 1.1*lag(pop))
which would have been my first guess just produces
time pop
1 1 NA
2 2 55
3 3 55
4 4 55
5 5 55
I feel like I'm missing something obvious.... what is it?
Note - this is a trivial example - my real examples use multiple parameters, many of which are time-varying (I'm simulating forecasts under different GCM scenarios), so, the tidyverse is proving to be a powerful tool in bringing my simulations together.
Reduce
(or its purrr variants, if you like) is what you want for cumulative functions that don't already have a cum*
version written:
data.frame(time = 1:5, pop = 50) %>%
mutate(pop = Reduce(function(x, y){x * 1.1}, pop, accumulate = TRUE))
## time pop
## 1 1 50.000
## 2 2 55.000
## 3 3 60.500
## 4 4 66.550
## 5 5 73.205
or with purrr,
data.frame(time = 1:5, pop = 50) %>%
mutate(pop = accumulate(pop, ~.x * 1.1))
## time pop
## 1 1 50.000
## 2 2 55.000
## 3 3 60.500
## 4 4 66.550
## 5 5 73.205
If the starting value of pop
is, say, 50, then pop = 50 * 1.1^(0:4)
will give you the next four values. With your code, you could do:
data.frame(time=1:5, pop=50) %>%
mutate(pop = pop * 1.1^(1:n() - 1))
Or,
base = 50
data.frame(time=1:5) %>%
mutate(pop = base * 1.1^(1:n()-1))
Purrr's accumulate function can handle time-varying indices, if you pass them
to your simulation function as a list with all the parameters in it. However, it takes a bit of wrangling to get this working correctly. The trick here is that accumulate() can work on list as well as vector columns. You can use the tidyr
function nest() to group columns into a list vector containing the current population state and parameters, then use accumulate() on the resulting list column. This is a bit complicated to explain, so I've included a demo, simulating logistic growth with either a constant growth rate or a time-varying stochastic growth rate. I also included an example of how to use this to simulate multiple replicates for a given model using dpylr+purrr+tidyr.
library(dplyr)
library(purrr)
library(ggplot2)
library(tidyr)
# Declare the population growth function. Note: the first two arguments
# have to be .x (the prior vector of populations and parameters) and .y,
# the current parameter value and population vector.
# This example function is a Ricker population growth model.
logistic_growth = function(.x, .y, growth, comp) {
pop = .x$pop[1]
growth = .y$growth[1]
comp = .y$comp[1]
# Note: this uses the state from .x, and the parameter values from .y.
# The first observation will use the first entry in the vector for .x and .y
new_pop = pop*exp(growth - pop*comp)
.y$pop[1] = new_pop
return(.y)
}
# Starting parameters the number of time steps to simulate, initial population size,
# and ecological parameters (growth rate and intraspecific competition rate)
n_steps = 100
pop_init = 1
growth = 0.5
comp = 0.05
#First test: fixed growth rates
test1 = data_frame(time = 1:n_steps,pop = pop_init,
growth=growth,comp =comp)
# here, the combination of nest() and group_by() split the data into individual
# time points and then groups all parameters into a new vector called state.
# ungroup() removes the grouping structure, then accumulate runs the function
#on the vector of states. Finally unnest transforms it all back to a
#data frame
out1 = test1 %>%
group_by(time)%>%
nest(pop, growth, comp,.key = state)%>%
ungroup()%>%
mutate(
state = accumulate(state,logistic_growth))%>%
unnest()
# This is the same example, except I drew the growth rates from a normal distribution
# with a mean equal to the mean growth rate and a std. dev. of 0.1
test2 = data_frame(time = 1:n_steps,pop = pop_init,
growth=rnorm(n_steps, growth,0.1),comp=comp)
out2 = test2 %>%
group_by(time)%>%
nest(pop, growth, comp,.key = state)%>%
ungroup()%>%
mutate(
state = accumulate(state,logistic_growth))%>%
unnest()
# This demostrates how to use this approach to simulate replicates using dplyr
# Note the crossing function creates all combinations of its input values
test3 = crossing(rep = 1:10, time = 1:n_steps,pop = pop_init, comp=comp) %>%
mutate(growth=rnorm(n_steps*10, growth,0.1))
out3 = test3 %>%
group_by(rep)%>%
group_by(rep,time)%>%
nest(pop, growth, comp,.key = state)%>%
group_by(rep)%>%
mutate(
state = accumulate(state,logistic_growth))%>%
unnest()
print(qplot(time, pop, data=out1)+
geom_line() +
geom_point(data= out2, col="red")+
geom_line(data=out2, col="red")+
geom_point(data=out3, col="red", alpha=0.1)+
geom_line(data=out3, col="red", alpha=0.1,aes(group=rep)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With