I am trying to come up with a variant of mapply
(call it xapply
for now) that combines the functionality (sort of) of expand.grid
and mapply
. That is, for a function FUN
and a list of arguments L1
, L2
, L3
, ... of unknown length, it should produce a list of length n1*n2*n3
(where ni
is the length of list i
) which is the result of applying FUN
to all combinations of the elements of the list.
If expand.grid
worked to generate lists of lists rather than data frames, one might be able to use it, but I have in mind that the lists may be lists of things that won't necessarily fit into a data frame nicely.
This function works OK if there are exactly three lists to expand, but I am curious about a more generic solution. (FLATTEN
is unused, but I can imagine that FLATTEN=FALSE
would generate nested lists rather than a single list ...)
xapply3 <- function(FUN,L1,L2,L3,FLATTEN=TRUE,MoreArgs=NULL) {
retlist <- list()
count <- 1
for (i in seq_along(L1)) {
for (j in seq_along(L2)) {
for (k in seq_along(L3)) {
retlist[[count]] <- do.call(FUN,c(list(L1[[i]],L2[[j]],L3[[k]]),MoreArgs))
count <- count+1
}
}
}
retlist
}
edit: forgot to return the result. One might be able to solve this by making a list of the indices with combn
and going from there ...
I think I have a solution to my own question, but perhaps someone can do better (and I haven't implemented FLATTEN=FALSE
...)
xapply <- function(FUN,...,FLATTEN=TRUE,MoreArgs=NULL) {
L <- list(...)
inds <- do.call(expand.grid,lapply(L,seq_along)) ## Marek's suggestion
retlist <- list()
for (i in 1:nrow(inds)) {
arglist <- mapply(function(x,j) x[[j]],L,as.list(inds[i,]),SIMPLIFY=FALSE)
if (FLATTEN) {
retlist[[i]] <- do.call(FUN,c(arglist,MoreArgs))
}
}
retlist
}
edit: I tried @baptiste's suggestion, but it's not easy (or wasn't for me). The closest I got was
xapply2 <- function(FUN,...,FLATTEN=TRUE,MoreArgs=NULL) {
L <- list(...)
xx <- do.call(expand.grid,L)
f <- function(...) {
do.call(FUN,lapply(list(...),"[[",1))
}
mlply(xx,f)
}
which still doesn't work. expand.grid
is indeed more flexible than I thought (although it creates a weird data frame that can't be printed), but enough magic is happening inside mlply
that I can't quite make it work.
Here is a test case:
L1 <- list(data.frame(x=1:10,y=1:10),
data.frame(x=runif(10),y=runif(10)),
data.frame(x=rnorm(10),y=rnorm(10)))
L2 <- list(y~1,y~x,y~poly(x,2))
z <- xapply(lm,L2,L1)
xapply(lm,L2,L1)
@ben-bolker, I had a similar desire and think I have a preliminary solution worked out, that I've also tested to work in parallel. The function, which I somewhat confusingly called gmcmapply
(g for grid) takes an arbitrarily large named list mvars
(that gets expand.grid
-ed within the function) and a FUN
that utilizes the list names as if they were arguments to the function itself (gmcmapply
will update the formals of FUN
so that by the time FUN
is passed to mcmapply
it's arguments reflect the variables that the user would like to iterate over (which would be layers in a nested for loop)). mcmapply
then dynamically updates the values of these formals as it cycles over the expanded set of variables in mvars
.
I've posted the preliminary code as a gist (reprinted with an example below) and would be curious to get your feedback on it. I'm a grad student, that is self-described as an intermediately-skilled R enthusiast, so this is pushing my R skills for sure. You or other folks in the community may have suggestions that would improve on what I have. I do think even as it stands, I'll be coming to this function quite a bit in the future.
gmcmapply <- function(mvars, FUN, SIMPLIFY = FALSE, mc.cores = 1, ...){
require(parallel)
FUN <- match.fun(FUN)
funArgs <- formals(FUN)[which(names(formals(FUN)) != "...")] # allow for default args to carry over from FUN.
expand.dots <- list(...) # allows for expanded dot args to be passed as formal args to the user specified function
# Implement non-default arg substitutions passed through dots.
if(any(names(funArgs) %in% names(expand.dots))){
dot_overwrite <- names(funArgs[which(names(funArgs) %in% names(expand.dots))])
funArgs[dot_overwrite] <- expand.dots[dot_overwrite]
#for arg naming and matching below.
expand.dots[dot_overwrite] <- NULL
}
## build grid of mvars to loop over, this ensures that each combination of various inputs is evaluated (equivalent to creating a structure of nested for loops)
grid <- expand.grid(mvars,KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
# specify formals of the function to be evaluated by merging the grid to mapply over with expanded dot args
argdefs <- rep(list(bquote()), ncol(grid) + length(expand.dots) + length(funArgs) + 1)
names(argdefs) <- c(colnames(grid), names(funArgs), names(expand.dots), "...")
argdefs[which(names(argdefs) %in% names(funArgs))] <- funArgs # replace with proper dot arg inputs.
argdefs[which(names(argdefs) %in% names(expand.dots))] <- expand.dots # replace with proper dot arg inputs.
formals(FUN) <- argdefs
if(SIMPLIFY) {
#standard mapply
do.call(mcmapply, c(FUN, c(unname(grid), mc.cores = mc.cores))) # mc.cores = 1 == mapply
} else{
#standard Map
do.call(mcmapply, c(FUN, c(unname(grid), SIMPLIFY = FALSE, mc.cores = mc.cores)))
}
}
example code below:
# Example 1:
# just make sure variables used in your function appear as the names of mvars
myfunc <- function(...){
return_me <- paste(l3, l1^2 + l2, sep = "_")
return(return_me)
}
mvars <- list(l1 = 1:10,
l2 = 1:5,
l3 = letters[1:3])
### list output (mapply)
lreturns <- gmcmapply(mvars, myfunc)
### concatenated output (Map)
lreturns <- gmcmapply(mvars, myfunc, SIMPLIFY = TRUE)
## N.B. This is equivalent to running:
lreturns <- c()
for(l1 in 1:10){
for(l2 in 1:5){
for(l3 in letters[1:3]){
lreturns <- c(lreturns,myfunc(l1,l2,l3))
}
}
}
### concatenated outout run on 2 cores.
lreturns <- gmcmapply(mvars, myfunc, SIMPLIFY = TRUE, mc.cores = 2)
Example 2. Pass non-default args to FUN.
## Since the apply functions dont accept full calls as inputs (calls are internal), user can pass arguments to FUN through dots, which can overwrite a default option for FUN.
# e.g. apply(x,1,FUN) works and apply(x,1,FUN(arg_to_change= not_default)) does not, the correct way to specify non-default/additional args to FUN is:
# gmcmapply(mvars, FUN, arg_to_change = not_default)
## update myfunc to have a default argument
myfunc <- function(rep_letters = 3, ...){
return_me <- paste(rep(l3, rep_letters), l1^2 + l2, sep = "_")
return(return_me)
}
lreturns <- gmcmapply(mvars, myfunc, rep_letters = 1)
A bit of additional functionality I would like to add but am still trying to work out is
cleaning up the output to be a pretty nested list with the names of mvars (normally, I'd create multiple lists within a nested for loop and tag lower-level lists onto higher level lists all the way up until all layers of the gigantic nested loop were done). I think using some abstracted variant of the solution provided here will work, but I haven't figured out how to make the solution flexible to the number of columns in the expand.grid
-ed data.frame.
I would like an option to log the outputs of the child processesthat get called in mcmapply
in a user-specified directory. So you could look at .txt outputs from every combination of variables generated by expand.grid
(i.e. if the user prints model summaries or status messages as a part of FUN
as I often do). I think a feasible solution is to use the substitute()
and body()
functions, described here to edit FUN
to open a sink()
at the beginning of FUN
and close it at the end if the user specifies a directory to write to. Right now, I just program it right into FUN
itself, but later it would be nice to just pass gmcmapply
an argument called something like log_children = "path_to_log_dir
. and then editing the body of the function to (pseudocode) sink(file = file.path(log_children, paste0(paste(names(mvars), sep = "_"), ".txt")
Let me know what you think!
-Nate
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With