Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nested ifelse: improved syntax

Description

ifelse() function allows to filter the values in a vector through a series of tests, each of them producing different actions in case of a positive result. For instance, let xx be a data.frame, as follows:

xx <- data.frame(a=c(1,2,1,3), b=1:4)
xx

a b
1 1
2 2
1 3
3 4

Suppose that you want to create a new column, c, from column b, but depending on the values in column a in the following way:

For each row,

  • if the value in column a is 1, the value in column c, is the same value in column b.
  • if the value in column a is 2, the value in column c, is 100 times the value in column b.
  • in any other case, the value in column c is the negative of the value in column b.

Using ifelse(), a solution could be:

xx$c <- ifelse(xx$a==1, xx$b, 
               ifelse(xx$a==2, xx$b*100,
                      -xx$b))
xx

a b c
1 1 1
2 2 200
1 3 3
3 4 -4

Problem 1

An aesthetic problem arises when the number of tests increases, say, four tests:

xx$c <- ifelse(xx$a==1, xx$b, 
           ifelse(xx$a==2, xx$b*100,
                  ifelse(xx$a==3, ...,
                         ifelse(xx$a==4, ...,
                                ...))))

I found partial solution to the problem in this page, which consists in the definition of the functions if.else_(), i_(), e_(), as follows:

library(lazyeval)
i_ <- function(if_stat, then) {
    if_stat <- lazyeval::expr_text(if_stat)
    then    <- lazyeval::expr_text(then)
    sprintf("ifelse(%s, %s, ", if_stat, then)
}

e_ <- function(else_ret) {
    else_ret <- lazyeval::expr_text(else_ret)
    else_ret
}

if.else_ <- function(...) {
    args <- list(...)

    for (i in 1:(length(args) - 1) ) {
        if (substr(args[[i]], 1, 6) != "ifelse") {
            stop("All but the last argument, need to be if.then_ functions.", call. = FALSE)
        }
    }
    if (substr(args[[length(args)]], 1, 6) == "ifelse"){
        stop("Last argument needs to be an else_ function.", call. = FALSE)
    }
    args$final <- paste(rep(')', length(args) - 1), collapse = '')
    eval_string <- do.call('paste', args)
    eval(parse(text = eval_string))
}

In this way, the problem given in the Description, can be rewritten as follows:

xx <- data.frame(a=c(1,2,1,3), b=1:4)
xx$c <- if.else_(
    i_(xx$a==1, xx$b),
    i_(xx$a==2, xx$b*100),
    e_(-xx$b)
) 
xx

a b c
1 1 1
2 2 200
1 3 3
3 4 -4

And the code for the four tests will simply be:

xx$c <- if.else_(
    i_(xx$a==1, xx$b),
    i_(xx$a==2, xx$b*100),
    i_(xx$a==3, ...), # dots meaning actions for xx$a==3
    i_(xx$a==4, ...), # dots meaning actions for xx$a==4
    e_(...)           # dots meaning actions for any other case
) 

Problem 2 & Question

The given code apparently solves the problem. Then, I wrote the following test function:

test.ie <- function() {
    dd <- data.frame(a=c(1,2,1,3), b=1:4)
    if.else_(
        i_(dd$a==1, dd$b),
        i_(dd$a==2, dd$b*100),
        e_(-dd$b)
    ) # it should give c(1, 200, 3, -4)
}

When I tried the test:

 test.ie()

it spit the following error message:

Error in ifelse(dd$a == 1, dd$b, ifelse(dd$a == 2, dd$b * 100, -dd$b)) :
object 'dd' not found

Question

Since the if.else_() syntactic constructor is not supposed to run only from the console, is there a way for it to 'know' the variables from the function that calls it?

Note

In "Best way to replace a lengthy ifelse structure in R", a similar problem was posted. However, the given solution there focuses on building the table's new column with the given constant output values (the "then" or "else" slots of the ifelse() function), whereas my case addresses a syntactic problem in which the "then" or "else" slots can even be expressions in terms of other data.frame elements or variables.

like image 608
JulioSergio Avatar asked Dec 13 '22 22:12

JulioSergio


2 Answers

I think you can use dplyr::case_when inside dplyr::mutate to achieve this.

library(dplyr)

df <- tibble(a=c(1,2,1,3), b=1:4)

df %>% 
  mutate(
    foo = case_when(
      .$a == 1 ~ .$b,
      .$a == 2 ~ .$b * 100L,
      TRUE   ~ .$b * -1L
    )
  )

#> # A tibble: 4 x 3
#>       a     b   foo
#>   <dbl> <int> <int>
#> 1     1     1     1
#> 2     2     2   200
#> 3     1     3     3
#> 4     3     4    -4

In the upcoming relase of dplyr 0.6.0 you won't need to use the akward work-around of .$, and you can just use:

df %>% 
  mutate(
    foo = case_when(
      a == 1 ~ b,
      a == 2 ~ b * 100L,
      TRUE   ~ b * -1L
    )
  )
like image 197
austensen Avatar answered Dec 30 '22 18:12

austensen


Taking into account MrFlick's advice, I re-coded the if.else_() function as follows:

if.else_ <- function(...) {
    args <- list(...)

    for (i in 1:(length(args) - 1) ) {
        if (substr(args[[i]], 1, 6) != "ifelse") {
            stop("All but the last argument, need to be if.then_ functions.", call. = FALSE)
        }
    }
    if (substr(args[[length(args)]], 1, 6) == "ifelse"){
        stop("Last argument needs to be an else_ function.", call. = FALSE)
    }
    args$final <- paste(rep(')', length(args) - 1), collapse = '')
    eval_string <- do.call('paste', args)
    eval(parse(text = eval_string), envir = parent.frame())
}

Now the test.ie() function runs properly

test.ie()

[1] 1 200 3 -4

like image 23
JulioSergio Avatar answered Dec 30 '22 18:12

JulioSergio