When I use nested function in a piping step, the order of execution seems unintuitive.
df <- data.frame(a = c(1,NA,2), b = c(NA, NA, 1))
df %>% is.na %>% colSums # Produce correct count of missing values
df %>% colSums(is.na(.)) # Produce NA
Can anyone explain why the nested function in the third line does not produce the correct result?
What does the pipe do? The pipe operator, written as %>% , has been a longstanding feature of the magrittr package for R. It takes the output of one function and passes it into another function as an argument. This allows us to link a sequence of analysis steps.
The pipe operator is used when we have nested functions to use in R Programming. Where the result of one function becomes the argument for the next function. The pipe functions improve the efficiency as well as readability of code.
In R, the pipe operator is, as you have already seen, %>% . If you're not familiar with F#, you can think of this operator as being similar to the + in a ggplot2 statement.
And since R is a functional programming language, meaning that everything you do is basically built on functions, you can use the pipe operator to feed into just about any argument call. For example, we can pipe into a linear regression function and then get the summary of the regression parameters.
It's because the .
always gets passed as the first argument to the following function. So in your second attempt at colSums
, you assume that you're passing is.na(.)
as the first argument to colSums
, but you're actually passing it as the second, which is the na.rm
parameter. So what your code actually looks like is df %>% colSums(x = ., na.rm = is.na(.))
. You can prevent the .
being passed as the first parameter by using {}
around the function. df %>% {colSums(is.na(.))}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With