Goals
I want to use dplyr to run simulations on grids of parameters. Specifically, I'd like a function that I can use in another program that
Here's my approach
require(dplyr)
run <- function(data, fun, fixed_parameters, ...) {
## ....
## argument checking
##
fixed_parameters <- as.environment(fixed_parameters)
grouped_out <- do_(rowwise(data), ~ do.call(fun, c(., fixed_parameters, ...)))
ungroup(grouped_out)
}
This works. For example, for
growth <- function(n, r, K, b) {
# some dynamical simulation
# this is an obviously-inefficient way to do this ;)
n + r - exp(n) / K - b - rnorm(1, 0, 0.1)
}
growth_runner <- function(r, K, b, ic, ...) {
# a wrapper to run the simulation with some fixed values
n0 = ic$N0
T = ic$T
reps = ic$reps
data.frame(n_final = replicate(reps, {for(t in 1:T) {
n0 <- growth(n0, r, K, b)
};
n0})
)
}
I can define and run,
data <- expand.grid(b = seq(0.01, 0.5, length.out=10),
K = exp(seq(0.1, 5, length.out=10)),
r = seq(0.5, 3.5, length.out=10))
initial_data = list(N0=0.9, T=5, reps=20)
output <- run(data, growth_runner, initial_data)
Question
Even though this seems to work, I wonder if there's a way to do it without do.call
. (In part because of issues with do.call.)
I really am interested in a way to replace the line grouped_out <- do_(rowwise(data), ~ do.call(fun, c(., fixed_parameters, ...)))
with something that does the same thing but without do.call
. Edit: An approach that somehow avoids the performance penalties of using do.call
outlined at the above link would also work.
Notes and References
.values
could work in place of do.call
I found it a little tricky to follow your code, but I think this is equivalent.
First I define a function that does the computation you're interested in:
growth_t <- function(n0, r, K, b, T) {
n <- n0
for (t in 1:T) {
n <- n + r - exp(n) / K - b - rnorm(1, 0, 0.1)
}
n
}
Then I define the data that you want to vary, including a "dummy" variable for reps:
data <- expand.grid(
b = seq(0.01, 0.5, length.out = 5),
K = exp(seq(0.1, 5, length.out = 5)),
r = seq(0.5, 3.5, length.out = 5),
rep = 1:20
)
Then I can feed it into purrr::pmap_d()
. pmap_d()
does a "parallel" map - i.e. it takes a list (or data frame) as input, and calls the function varying all the named arguments for each iteration. The fixed parameters are supplied after the function name.
library(purrr)
data$output <- pmap_dbl(data[1:3], growth_t, n0 = 0.9, T = 5)
This really doesn't feel like a dplyr problem to me, because it's not really about data manipulation.
The below avoids using do.call
and presents the output in the same way as the OP.
First, replace the parameters of the function with a vector that you'll pass in - this is what you'll pass through using apply.
growth_runner <- function(data.in, ic, ...) {
# a wrapper to run the simulation with some fixed values
n0 = ic$N0
T = ic$T
reps = ic$reps
data.frame(n_final = replicate(reps, {for(t in 1:T) {
n0 <- growth(n0, data.in[3], data.in[2], data.in[1])
};
n0})
)
}
Set your grid you want to search over, just as you did before.
data <- expand.grid(b = seq(0.01, 0.5, length.out=10),
K = exp(seq(0.1, 5, length.out=10)),
r = seq(0.5, 3.5, length.out=10))
initial_data = list(N0=0.9, T=5, reps=20)
Use apply to go through your grid, then append the results
output.mid = apply(data, 1, ic=initial_data, FUN=growth_runner)
output <- data.frame('n_final'=unlist(output.mid))
And you have your output without any calls to do.call
or any external library.
> dim(output)
[1] 20000 1
> head(output)
n_final
1 -0.6375070
2 -0.7617193
3 -0.3266347
4 -0.7921655
5 -0.5874983
6 -0.4083613
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With