Why should I use |
vs any()
when I'm comparing columns in dplyr::mutate()
?
And why do they return different answers?
For example:
library(tidyverse)
df <- data_frame(x = rep(c(T,F,T), 4), y = rep(c(T,F,T, F), 3), allF = F, allT = T)
df %>%
mutate(
withpipe = x | y # returns expected results by row
, usingany = any(c(x,y)) # returns TRUE for every row
)
What's going on here and why should I use one way of comparing values over another?
The difference between the two is how the answer is calculated:
|
, elements are compared row-wise and boolean logic is used to return the proper value. In the example above each x and y pair are compared to each other and a logical value is returned for each pair, resulting in 12 different answers, one for each row of the data frame.any()
, on the other hand, looks at the entire vector and returns a single value. In the above example, the mutate line that calculates the new usingany
column is basically doing this: any(c(df$x, df$y))
, which will return TRUE
because there's at least one TRUE
value in either df$x
or df$y
. That single value is then assigned to every row of the data frame.You can see this in action using the other columns in your data frame:
df %>%
mutate(
usingany = any(c(x,y)) # returns all TRUE
, allfany = any(allF) # returns all FALSE because every value in df$allF is FALSE
)
To answer when you should use which: use |
when you want to compare elements row-wise. Use any()
when you want a universal answer about the entire data frame.
TLDR, when using dplyr::mutate()
, you're usually going to want to use |
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With