Why should I use | vs any() when I'm comparing columns in dplyr::mutate()?
And why do they return different answers?
For example:
library(tidyverse)
df <- data_frame(x = rep(c(T,F,T), 4), y = rep(c(T,F,T, F), 3), allF = F, allT = T)
df %>%
mutate(
withpipe = x | y # returns expected results by row
, usingany = any(c(x,y)) # returns TRUE for every row
)
What's going on here and why should I use one way of comparing values over another?
The difference between the two is how the answer is calculated:
|, elements are compared row-wise and boolean logic is used to return the proper value. In the example above each x and y pair are compared to each other and a logical value is returned for each pair, resulting in 12 different answers, one for each row of the data frame.any(), on the other hand, looks at the entire vector and returns a single value. In the above example, the mutate line that calculates the new usingany column is basically doing this: any(c(df$x, df$y)), which will return TRUE because there's at least one TRUE value in either df$x or df$y. That single value is then assigned to every row of the data frame.You can see this in action using the other columns in your data frame:
df %>%
mutate(
usingany = any(c(x,y)) # returns all TRUE
, allfany = any(allF) # returns all FALSE because every value in df$allF is FALSE
)
To answer when you should use which: use | when you want to compare elements row-wise. Use any() when you want a universal answer about the entire data frame.
TLDR, when using dplyr::mutate(), you're usually going to want to use |.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With