Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using any() vs | in dplyr::mutate

Tags:

r

dplyr

tidyverse

Why should I use | vs any() when I'm comparing columns in dplyr::mutate()?

And why do they return different answers?

For example:

library(tidyverse)
df  <- data_frame(x = rep(c(T,F,T), 4), y = rep(c(T,F,T, F), 3), allF  = F, allT = T)

 df %>%
     mutate( 
          withpipe = x | y # returns expected results by row
        , usingany = any(c(x,y)) # returns TRUE for every row
     )

What's going on here and why should I use one way of comparing values over another?

like image 242
crazybilly Avatar asked May 09 '18 17:05

crazybilly


1 Answers

The difference between the two is how the answer is calculated:

  • for |, elements are compared row-wise and boolean logic is used to return the proper value. In the example above each x and y pair are compared to each other and a logical value is returned for each pair, resulting in 12 different answers, one for each row of the data frame.
  • any(), on the other hand, looks at the entire vector and returns a single value. In the above example, the mutate line that calculates the new usingany column is basically doing this: any(c(df$x, df$y)), which will return TRUE because there's at least one TRUE value in either df$x or df$y. That single value is then assigned to every row of the data frame.

You can see this in action using the other columns in your data frame:

df %>% 
    mutate(
        usingany = any(c(x,y)) # returns all TRUE
      , allfany  = any(allF)   # returns all FALSE because every value in df$allF is FALSE
    )

To answer when you should use which: use | when you want to compare elements row-wise. Use any() when you want a universal answer about the entire data frame.

TLDR, when using dplyr::mutate(), you're usually going to want to use |.

like image 109
crazybilly Avatar answered Oct 06 '22 23:10

crazybilly