Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programming-safe version of subset - to evaluate its condition while called from another function

Tags:

r

As subset() manual states:

Warning: This is a convenience function intended for use interactively

I learned from this great article not only the secret behind this warning, but a good understanding of substitute(), match.call(), eval(), quote(), ‍call, promise and other related R subjects, that are a little bit complicated.

Now I understand what's the warning above for. A super-simple implementation of subset() could be as follows:

subset = function(x, condition) x[eval(substitute(condition), envir=x),]

While subset(mtcars, cyl==4) returns the table of rows in mtcars that satisfy cyl==4, enveloping subset() in another function fails:

sub = function(x, condition) subset(x, condition)

sub(mtcars, cyl == 4)
# Error in eval(expr, envir, enclos) : object 'cyl' not found

Using the original version of subset() also produces exactly the same error condition. This is due to the limitation of substitute()-eval() pair: It works fine while condition is cyl==4, but when the condition is passed through the enveloping function sub(), the condition argument of subset() will be no longer cyl==4, but the nested condition in the sub() body, and the eval() fails - it's a bit complicated.

But does it exist any other implementation of subset() with exactly the same arguments that would be programming-safe - i.e. able to evaluate its condition while it's called by another function?

like image 924
Ali Avatar asked Oct 11 '12 23:10

Ali


2 Answers

The [ function is what you're looking for. ?"[". mtcars[mtcars$cyl == 4,] is equivalent to the subset command and is "programming" safe.

sub = function(x, condition) {
 x[condition,]
}

sub(mtcars, mtcars$cyl==4)

Does what you're asking without the implicit with() in the function call. The specifics are complicated, however a function like:

sub = function(x, quoted_condition) {
  x[with(x, eval(parse(text=quoted_condition))),]
}

sub(mtcars, 'cyl==4')

Sorta does what you're looking for, but there are edge cases where this will have unexpected results.


using data.table and the [ subset function you can get the implicit with(...) you're looking for.

library(data.table)
MT = data.table(mtcars)

MT[cyl==4]

there are better, faster ways to do this subsetting in data.table, but this illustrates the point well.


using data.table you can also construct expressions to be evaluated later

cond = expression(cyl==4)

MT[eval(cond)]

these two can now be passed through functions:

wrapper = function(DT, condition) {
  DT[eval(condition)]
}
like image 121
Justin Avatar answered Sep 19 '22 16:09

Justin


Here's an alternative version of subset() which continues to work even when it's nested -- at least as long as the logical subsetting expression (e.g. cyl == 4) is supplied to the top-level function call.

It works by climbing up the call stack, substitute()ing at each step to ultimately capture the logical subsetting expression passed in by the user. In the call to sub2() below, for example, the for loop works up the call stack from expr to x to AA and finally to cyl ==4.

SUBSET <- function(`_dat`, expr) {
    ff <- sys.frames()
    ex <- substitute(expr)
    ii <- rev(seq_along(ff))
    for(i in ii) {
        ex <- eval(substitute(substitute(x, env=sys.frames()[[n]]),
                              env = list(x = ex, n=i)))
    }
    `_dat`[eval(ex, envir = `_dat`),]
}

## Define test functions that nest SUBSET() more and more deeply
sub <- function(x, condition) SUBSET(x, condition)
sub2 <- function(AA, BB) sub(AA, BB)

## Show that it works, at least when the top-level function call
## contains the logical subsetting expression
a <- SUBSET(mtcars, cyl == 4)  ## Direct call to SUBSET()
b <- sub(mtcars, cyl == 4)     ## SUBSET() called one level down
c <- sub2(mtcars, cyl == 4)    ## SUBSET() called two levels down

identical(a,b)
# [1] TRUE
> identical(a,c)
# [1] TRUE
a[1:5,]
#                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
# Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
# Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
# Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
# Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2

** For some explanation of the construct inside the for loop, see Section 6.2, paragraph 6 of the R Language Definition manual.

like image 24
Josh O'Brien Avatar answered Sep 20 '22 16:09

Josh O'Brien