Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does dplyr error in this nested if_else, when logical condition means output should not be evaluated?

Tags:

r

dplyr

I have a nested if_else statement inside mutate. In my example data frame:

tmp_df2 <- data.frame(a = c(1,1,2), b = c(T,F,T), c = c(1,2,3))

  a     b c
1 1  TRUE 1
2 1 FALSE 2
3 2  TRUE 3

I wish to group by a and then perform operations based on whether a group has one or two rows. I would have thought this nested if_else would suffice:

tmp_df2 %>%
    group_by(a) %>%
    mutate(tmp_check = n() == 1) %>%
    mutate(d = if_else(tmp_check, # check for number of entries in group
                       0,
                       if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)])
    )
    )

But this throws the error:

Error in eval(substitute(expr), envir, enclos) : 
  `false` is length 2 not 1 or 1.

The way the example is set up, when the first if_else(n() == 1) condition evaluates to true, then one element is returned, but when it evaluates to false, then a vector with two elements is returned, which is what I am assuming is causing the error. Yet, logically this statement seems sound to me.

The following two statements produce (desired) results:

> tmp_df2 %>%
+     group_by(a) %>%
+     mutate(d = ifelse(rep(n() == 1, n()), # avoid undesired recycling
+                        0,
+                        if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)])
+     )
+     )
Source: local data frame [3 x 4]
Groups: a [2]

      a     b     c     d
  <dbl> <lgl> <dbl> <dbl>
1     1  TRUE     1   3.0
2     1 FALSE     2   1.5
3     2  TRUE     3   0.0

or just filtering so that only groups containing two rows are left:

> tmp_df2 %>%
+     group_by(a) %>%
+     filter(n() == 2) %>%
+     mutate(d = if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)]))
Source: local data frame [2 x 4]
Groups: a [1]

      a     b     c     d
  <dbl> <lgl> <dbl> <dbl>
1     1  TRUE     1   3.0
2     1 FALSE     2   1.5

I have three questions.

  1. How does dplyr know that the second output that should not have been evaluated, due to the logical condition, is invalid?

  2. How do I get the desired behaviour in dplyr (without using ifelse)?

EDIT as noted in an answer, either do not have the temporary tmp_check column and use the if ... else construct, or use the following code that works, but produces warnings:

library(dplyr)
tmp_df2 %>%
    group_by(a) %>%
    mutate(tmp_check = n() == 1) %>%
    mutate(d = if (tmp_check)  # check for number of entries in group
                       0 else
                       if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)])
    )
like image 455
Alex Avatar asked Nov 07 '16 23:11

Alex


1 Answers

dplyr "knows" because if_else checks values to use for both True and False cases. This is stated in ?if_else, and the source tells us how it's done:

if_else
# function (condition, true, false, missing = NULL) 
# {
#     if (!is.logical(condition)) {
#         stop("`condition` must be logical", call. = FALSE)
#     }
#     out <- true[rep(NA_integer_, length(condition))]
#     out <- replace_with(out, condition & !is.na(condition), true, 
#         "`true`")
#     out <- replace_with(out, !condition & !is.na(condition), 
#         false, "`false`")
#     out <- replace_with(out, is.na(condition), missing, "`missing`")
#     out
# }
# <environment: namespace:dplyr>

Inspecting the source for replace_with:

dplyr:::replace_with
# function (x, i, val, name) 
# {
#     if (is.null(val)) {
#         return(x)
#     }
#     check_length(val, x, name)
#     check_type(val, x, name)
#     check_class(val, x, name)
#     if (length(val) == 1L) {
#         x[i] <- val
#     }
#     else {
#         x[i] <- val[i]
#     }
#     x
# }
# <environment: namespace:dplyr>

So the lengths of the values for both True and False cases are checked.

To get your desired behavior you can use if ... else, as another SO user suggested in a previous question of yours:

tmp_df2 %>%
    group_by(a) %>%
    mutate(d = if (n() == 1) 0 else if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)])
    )
like image 190
Weihuang Wong Avatar answered Nov 19 '22 01:11

Weihuang Wong