I am having strange issues with dplyr and combination of group_by, mutate and ifelse. Consider the following data.frame
> df1 crawl.id group.id hits.diff 1 1 1 NA 2 1 2 NA 3 2 2 0 4 1 3 NA 5 1 3 NA 6 1 3 NA
When I use it the following code
library(dplyr) df1 %>% group_by(group.id) %>% mutate( hits.consumed = ifelse(hits.diff<=0,-hits.diff,0) )
For some reason I get
Error: incompatible types, expecting a logical vector**
However, removing either group_by()
or ifelse
everything works as expected:
df1 %>% mutate( hits.consumed = ifelse(hits.diff<=0,-hits.diff,0) ) crawl.id group.id hits.diff hits.consumed 1 1 1 NA NA 2 1 2 NA NA 3 2 2 0 0 4 1 3 NA NA 5 1 3 NA NA 6 1 3 NA NA df1 %>% group_by( group.id ) %>% mutate( hits.consumed = -hits.diff ) crawl.id group.id hits.diff hits.consumed 1 1 1 NA NA 2 1 2 NA NA 3 2 2 0 0 4 1 3 NA NA 5 1 3 NA NA 6 1 3 NA NA
Is it a bug or a feature? Can anyone replicate this? What's so special about that specific combination of group_by, mutate and ifelse that makes it fail?
My own research led me here: https://github.com/hadley/dplyr/issues/464 which suggests that it should be fixed by now.
Here is dput(df1)
:
structure(list(crawl.id = c(1, 1, 2, 1, 1, 1), group.id = structure(c(1L, 2L, 2L, 3L, 3L, 3L), .Label = c("1", "2", "3"), class = "factor"), hits.diff = c(NA, NA, 0, NA, NA, NA)), .Names = c("crawl.id", "group.id", "hits.diff"), row.names = c(NA, -6L), class = "data.frame")
Wrap it all in as.numeric
to force the output format so the NA
s, which are logical
by default, don't override the class of the output variable:
df1 %>% group_by(group.id) %>% mutate( hits.consumed = as.numeric(ifelse(hits.diff<=0,-hits.diff,0)) ) # crawl.id group.id hits.diff hits.consumed #1 1 1 NA NA #2 1 2 NA NA #3 2 2 0 0 #4 1 3 NA NA #5 1 3 NA NA #6 1 3 NA NA
Pretty sure this is the same issue as here: Custom sum function in dplyr returns inconsistent results , as this result suggests:
out <- df1[1:2,] %>% mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0)) class(out$hits.consumed) #[1] "logical" out <- df1[1:3,] %>% mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0)) class(out$hits.consumed) #[1] "numeric"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With