Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr error: strange issue when combining group_by, mutate and ifelse. Is it a bug?

Tags:

I am having strange issues with dplyr and combination of group_by, mutate and ifelse. Consider the following data.frame

> df1   crawl.id group.id hits.diff 1        1        1        NA 2        1        2        NA 3        2        2         0 4        1        3        NA 5        1        3        NA 6        1        3        NA 

When I use it the following code

library(dplyr) df1 %>%   group_by(group.id) %>%    mutate( hits.consumed = ifelse(hits.diff<=0,-hits.diff,0) ) 

For some reason I get

Error: incompatible types, expecting a logical vector** 

However, removing either group_by() or ifelse everything works as expected:

df1 %>%   mutate( hits.consumed = ifelse(hits.diff<=0,-hits.diff,0) )  crawl.id group.id hits.diff hits.consumed 1        1        1        NA            NA 2        1        2        NA            NA 3        2        2         0             0 4        1        3        NA            NA 5        1        3        NA            NA 6        1        3        NA            NA  df1 %>%   group_by( group.id ) %>%   mutate( hits.consumed = -hits.diff )    crawl.id group.id hits.diff hits.consumed 1        1        1        NA            NA 2        1        2        NA            NA 3        2        2         0             0 4        1        3        NA            NA 5        1        3        NA            NA 6        1        3        NA            NA 

Is it a bug or a feature? Can anyone replicate this? What's so special about that specific combination of group_by, mutate and ifelse that makes it fail?

My own research led me here: https://github.com/hadley/dplyr/issues/464 which suggests that it should be fixed by now.

Here is dput(df1):

structure(list(crawl.id = c(1, 1, 2, 1, 1, 1), group.id = structure(c(1L,  2L, 2L, 3L, 3L, 3L), .Label = c("1", "2", "3"), class = "factor"),      hits.diff = c(NA, NA, 0, NA, NA, NA)), .Names = c("crawl.id",  "group.id", "hits.diff"), row.names = c(NA, -6L), class = "data.frame") 
like image 807
akhmed Avatar asked Mar 24 '15 03:03

akhmed


1 Answers

Wrap it all in as.numeric to force the output format so the NAs, which are logical by default, don't override the class of the output variable:

df1 %>%   group_by(group.id) %>%    mutate( hits.consumed = as.numeric(ifelse(hits.diff<=0,-hits.diff,0)) )  #  crawl.id group.id hits.diff hits.consumed #1        1        1        NA            NA #2        1        2        NA            NA #3        2        2         0             0 #4        1        3        NA            NA #5        1        3        NA            NA #6        1        3        NA            NA 

Pretty sure this is the same issue as here: Custom sum function in dplyr returns inconsistent results , as this result suggests:

out <- df1[1:2,] %>%  mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0)) class(out$hits.consumed) #[1] "logical" out <- df1[1:3,] %>%  mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0)) class(out$hits.consumed) #[1] "numeric" 
like image 120
thelatemail Avatar answered Oct 23 '22 16:10

thelatemail