Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Non-standard evaluation of subset argument with mapply in R

I can not use the subset argument of xtabs or aggregate (or any function I tested, including ftable and lm) with mapply. The following calls fail with the subset argument, but they work without:

mapply(FUN = xtabs,
       formula = list(~ wool,
                      ~ wool + tension),
       subset = list(breaks < 15,
                     breaks < 20),
       MoreArgs = list(data = warpbreaks))

# Error in mapply(FUN = xtabs, formula = list(~wool, ~wool + tension), subset = list(breaks <  : 
#   object 'breaks' not found
# 
# expected result 1/2:
# wool
# A B 
# 2 2
# 
# expected result 2/2:
#     tension
# wool L M H
#    A 0 4 3
#    B 2 2 5

mapply(FUN = aggregate,
       formula = list(breaks ~ wool,
                      breaks ~ wool + tension),
       subset = list(breaks < 15,
                     breaks < 20),
       MoreArgs = list(data = warpbreaks,
                       FUN = length))

# Error in mapply(FUN = aggregate, formula = list(breaks ~ wool, breaks ~  : 
#   object 'breaks' not found
# 
# expected result 1/2:
#   wool breaks
# 1    A      2
# 2    B      2
# 
# expected result 2/2:
#   wool tension breaks
# 1    B       L      2
# 2    A       M      4
# 3    B       M      2
# 4    A       H      3
# 5    B       H      5

The errors seem to be due to subset arguments not being evaluated in the right environment. I know I can subset in the data argument with data = warpbreaks[warpbreaks$breaks < 20, ] as a workaround, but I am looking to improve my knowledge of R.

My questions are:

  • How can I use subset arguments with mapply? I tried with match.call and eval.parent, but without success so far (more details in my previous questions).
  • Why is the formula argument evaluated in data = warpbreaks, but the subset argument is not?
like image 482
Thomas Avatar asked Jun 29 '19 13:06

Thomas


1 Answers

The short answer is that when you create a list to pass as an argument to a function, it is evaluated at the point of creation. The error you are getting is because R tries to create the list you want to pass in the calling environment.

To see this more clearly, suppose you try creating the arguments you want to pass ahead of calling mapply:

f_list <- list(~ wool, ~ wool + tension)
d_list <- list(data = warpbreaks)
mapply(FUN = xtabs, formula = f_list, MoreArgs = d_list)
#> [[1]]
#> wool
#>  A  B 
#> 27 27 
#> 
#> [[2]]
#>     tension
#> wool L M H
#>    A 9 9 9
#>    B 9 9 9

There is no problem with creating a list of formulas, because these are not evaluated until needed, and of course warpbreaks is accessible from the global environment, hence this call to mapply works.

Of course, if you try to create the following list ahead of the mapply call:

subset_list <- list(breaks < 15, breaks < 20)

Then R will tell you that breaks isn't found.

However, if you create the list with warpbreaks in the search path, then you won't have a problem:

subset_list <- with(warpbreaks, list(breaks < 15, breaks < 20))
subset_list
#> [[1]]
#>  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [14]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
#> [27] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [40] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
#> [53] FALSE FALSE
#> 
#> [[2]]
#>  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE
#> [14]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE
#> [27] FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
#> [40]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
#> [53]  TRUE FALSE

so you would think that we could just pass this to mapply and everything would be fine, but now we get a new error:

mapply(FUN = xtabs, formula = f_list, subset = subset_list, MoreArgs = d_list)
#> Error in eval(substitute(subset), data, env) : object 'dots' not found

So why are we getting this?

The problem lies in any functions passed to mapply that call eval, or that themselves call a function that uses eval.

If you look at the source code for mapply you will see that it takes the extra arguments you have passed and puts them in a list called dots, which it will then pass to an internal mapply call:

mapply
#> function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) 
#> {
#>     FUN <- match.fun(FUN)
#>     dots <- list(...)
#>     answer <- .Internal(mapply(FUN, dots, MoreArgs))
#> ...

If your FUN itself calls another function that calls eval on any of its arguments, it will therefore try to eval the object dots, which won't exist in the environment in which the eval is called. This is easy to see by doing an mapply on a match.call wrapper:

mapply(function(x) match.call(), x = list(1))
[[1]]
(function(x) match.call())(x = dots[[1L]][[1L]])

So a minimal reproducible example of our error is

mapply(function(x) eval(substitute(x)), x = list(1))
#> Error in eval(substitute(x)) : object 'dots' not found

So what's the solution? It seems like you have already hit on a perfectly good one, that is, manually subsetting the data frame you wish to pass. Others may suggest that you explore purrr::map to get a more elegant solution.

However, it is possible to get mapply to do what you want, and the secret is just to modify FUN to turn it into an anonymous wrapper of xtabs that subsets on the fly:

mapply(FUN = function(formula, subset, data) xtabs(formula, data[subset,]), 
       formula = list(~ wool, ~ wool + tension),
       subset = with(warpbreaks, list(breaks < 15, breaks < 20)),
       MoreArgs = list(data = warpbreaks))
#> [[1]]
#> wool
#> A B 
#> 2 2 
#> 
#> [[2]]
#>     tension
#> wool L M H
#>    A 0 4 3
#>    B 2 2 5
like image 120
Allan Cameron Avatar answered Oct 13 '22 02:10

Allan Cameron