Using standard evaluation and do_ to run simulations on a grid of parameters without do.call

Question

Goals

I want to use dplyr to run simulations on grids of parameters. Specifically, I'd like a function that I can use in another program that

gets passed a data.frame
for every row calculates some simulation using each column as an argument
also is passed some extra data (e.g., initial conditions)

Here's my approach

require(dplyr)
run <- function(data, fun, fixed_parameters, ...) {
   ## ....
   ## argument checking
   ##

   fixed_parameters <- as.environment(fixed_parameters)
   grouped_out <- do_(rowwise(data), ~ do.call(fun, c(., fixed_parameters, ...)))
   ungroup(grouped_out)
 }

This works. For example, for

growth <- function(n, r, K, b) {
  # some dynamical simulation
  # this is an obviously-inefficient way to do this ;)
  n  + r - exp(n) / K - b - rnorm(1, 0, 0.1)
}
growth_runner <- function(r, K, b, ic, ...) {
  # a wrapper to run the simulation with some fixed values
  n0 = ic$N0
  T = ic$T
  reps = ic$reps
  data.frame(n_final = replicate(reps, {for(t in 1:T) {
                                          n0 <- growth(n0, r, K, b)
                                        };
                                        n0})
  )
}

I can define and run,

   data <- expand.grid(b = seq(0.01, 0.5, length.out=10),
                       K = exp(seq(0.1, 5, length.out=10)),
                       r = seq(0.5, 3.5, length.out=10))
   initial_data = list(N0=0.9, T=5, reps=20)
   output <- run(data, growth_runner, initial_data)

Question

Even though this seems to work, I wonder if there's a way to do it without do.call. (In part because of issues with do.call.)

I really am interested in a way to replace the line grouped_out <- do_(rowwise(data), ~ do.call(fun, c(., fixed_parameters, ...))) with something that does the same thing but without do.call. Edit: An approach that somehow avoids the performance penalties of using do.call outlined at the above link would also work.

Notes and References

this question on do.call and standard evaluation in dplyr is helpful, but I'm looking for a way to avoid do.call if possible
dplyr's nse vignette was helpful in writing this; and makes me think .values could work in place of do.call

hadley · Accepted Answer

I found it a little tricky to follow your code, but I think this is equivalent.

First I define a function that does the computation you're interested in:

growth_t <- function(n0, r, K, b, T) {
  n <- n0

  for (t in 1:T) {
    n <- n + r - exp(n) / K - b - rnorm(1, 0, 0.1)
  }
  n
}

Then I define the data that you want to vary, including a "dummy" variable for reps:

data <- expand.grid(
  b = seq(0.01, 0.5, length.out = 5),
  K = exp(seq(0.1, 5, length.out = 5)),
  r = seq(0.5, 3.5, length.out = 5),
  rep = 1:20
)

Then I can feed it into purrr::pmap_d(). pmap_d() does a "parallel" map - i.e. it takes a list (or data frame) as input, and calls the function varying all the named arguments for each iteration. The fixed parameters are supplied after the function name.

library(purrr)
data$output <- pmap_dbl(data[1:3], growth_t, n0 = 0.9, T = 5)

This really doesn't feel like a dplyr problem to me, because it's not really about data manipulation.

Tchotchke · Answer

The below avoids using do.call and presents the output in the same way as the OP.

First, replace the parameters of the function with a vector that you'll pass in - this is what you'll pass through using apply.

growth_runner <- function(data.in, ic, ...) {
  # a wrapper to run the simulation with some fixed values
  n0 = ic$N0
  T = ic$T
  reps = ic$reps
  data.frame(n_final = replicate(reps, {for(t in 1:T) {
    n0 <- growth(n0, data.in[3], data.in[2], data.in[1])
  };
    n0})
  )
}

Set your grid you want to search over, just as you did before.

data <- expand.grid(b = seq(0.01, 0.5, length.out=10),
                    K = exp(seq(0.1, 5, length.out=10)),
                    r = seq(0.5, 3.5, length.out=10))
initial_data = list(N0=0.9, T=5, reps=20)

Use apply to go through your grid, then append the results

output.mid = apply(data, 1, ic=initial_data, FUN=growth_runner)
output <- data.frame('n_final'=unlist(output.mid))

And you have your output without any calls to do.call or any external library.

> dim(output)
[1] 20000     1
> head(output)
     n_final
1 -0.6375070
2 -0.7617193
3 -0.3266347
4 -0.7921655
5 -0.5874983
6 -0.4083613

Using standard evaluation and do_ to run simulations on a grid of parameters without do.call

Tags:

design-patterns

r

simulation

tidyverse

jaimedash

2 Answers

hadley

Tchotchke

Recent Activity

Donate For Us

Using standard evaluation and do_ to run simulations on a grid of parameters without do.call

Tags:

design-patterns

r

simulation

tidyverse

jaimedash

2 Answers

hadley

Tchotchke

Related questions

Recent Activity

Donate For Us