I have a nested if_else
statement inside mutate
. In my example data frame:
tmp_df2 <- data.frame(a = c(1,1,2), b = c(T,F,T), c = c(1,2,3))
a b c
1 1 TRUE 1
2 1 FALSE 2
3 2 TRUE 3
I wish to group by a
and then perform operations based on whether a group has one or two rows. I would have thought this nested if_else
would suffice:
tmp_df2 %>%
group_by(a) %>%
mutate(tmp_check = n() == 1) %>%
mutate(d = if_else(tmp_check, # check for number of entries in group
0,
if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)])
)
)
But this throws the error:
Error in eval(substitute(expr), envir, enclos) :
`false` is length 2 not 1 or 1.
The way the example is set up, when the first if_else(n() == 1)
condition evaluates to true, then one element is returned, but when it evaluates to false, then a vector with two elements is returned, which is what I am assuming is causing the error. Yet, logically this statement seems sound to me.
The following two statements produce (desired) results:
> tmp_df2 %>%
+ group_by(a) %>%
+ mutate(d = ifelse(rep(n() == 1, n()), # avoid undesired recycling
+ 0,
+ if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)])
+ )
+ )
Source: local data frame [3 x 4]
Groups: a [2]
a b c d
<dbl> <lgl> <dbl> <dbl>
1 1 TRUE 1 3.0
2 1 FALSE 2 1.5
3 2 TRUE 3 0.0
or just filtering so that only groups containing two rows are left:
> tmp_df2 %>%
+ group_by(a) %>%
+ filter(n() == 2) %>%
+ mutate(d = if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)]))
Source: local data frame [2 x 4]
Groups: a [1]
a b c d
<dbl> <lgl> <dbl> <dbl>
1 1 TRUE 1 3.0
2 1 FALSE 2 1.5
I have three questions.
How does dplyr know that the second output that should not have been evaluated, due to the logical condition, is invalid?
How do I get the desired behaviour in dplyr (without using ifelse
)?
EDIT as noted in an answer, either do not have the temporary tmp_check
column and use the if ... else
construct, or use the following code that works, but produces warnings:
library(dplyr)
tmp_df2 %>%
group_by(a) %>%
mutate(tmp_check = n() == 1) %>%
mutate(d = if (tmp_check) # check for number of entries in group
0 else
if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)])
)
dplyr
"knows" because if_else
checks values to use for both True and False cases. This is stated in ?if_else
, and the source tells us how it's done:
if_else
# function (condition, true, false, missing = NULL)
# {
# if (!is.logical(condition)) {
# stop("`condition` must be logical", call. = FALSE)
# }
# out <- true[rep(NA_integer_, length(condition))]
# out <- replace_with(out, condition & !is.na(condition), true,
# "`true`")
# out <- replace_with(out, !condition & !is.na(condition),
# false, "`false`")
# out <- replace_with(out, is.na(condition), missing, "`missing`")
# out
# }
# <environment: namespace:dplyr>
Inspecting the source for replace_with
:
dplyr:::replace_with
# function (x, i, val, name)
# {
# if (is.null(val)) {
# return(x)
# }
# check_length(val, x, name)
# check_type(val, x, name)
# check_class(val, x, name)
# if (length(val) == 1L) {
# x[i] <- val
# }
# else {
# x[i] <- val[i]
# }
# x
# }
# <environment: namespace:dplyr>
So the lengths of the values for both True and False cases are checked.
To get your desired behavior you can use if ... else
, as another SO user suggested in a previous question of yours:
tmp_df2 %>%
group_by(a) %>%
mutate(d = if (n() == 1) 0 else if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)])
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With