Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Order of execution of nested functions in dplyr pipe

Tags:

r

dplyr

magrittr

When I use nested function in a piping step, the order of execution seems unintuitive.

df <- data.frame(a = c(1,NA,2), b = c(NA, NA, 1))
df %>% is.na %>% colSums # Produce correct count of missing values
df %>% colSums(is.na(.)) # Produce NA

Can anyone explain why the nested function in the third line does not produce the correct result?

like image 230
Heisenberg Avatar asked Jan 15 '16 19:01

Heisenberg


People also ask

How does the %>% pipe work in R?

What does the pipe do? The pipe operator, written as %>% , has been a longstanding feature of the magrittr package for R. It takes the output of one function and passes it into another function as an argument. This allows us to link a sequence of analysis steps.

How pipe operators use combining different functions?

The pipe operator is used when we have nested functions to use in R Programming. Where the result of one function becomes the argument for the next function. The pipe functions improve the efficiency as well as readability of code.

Which of the following symbol correctly represents a pipe operator?

In R, the pipe operator is, as you have already seen, %>% . If you're not familiar with F#, you can think of this operator as being similar to the + in a ggplot2 statement.

What is the purpose of pipe operator in R?

And since R is a functional programming language, meaning that everything you do is basically built on functions, you can use the pipe operator to feed into just about any argument call. For example, we can pipe into a linear regression function and then get the summary of the regression parameters.


1 Answers

It's because the . always gets passed as the first argument to the following function. So in your second attempt at colSums, you assume that you're passing is.na(.) as the first argument to colSums, but you're actually passing it as the second, which is the na.rm parameter. So what your code actually looks like is df %>% colSums(x = ., na.rm = is.na(.)). You can prevent the . being passed as the first parameter by using {} around the function. df %>% {colSums(is.na(.))}

like image 58
tblznbits Avatar answered Oct 25 '22 15:10

tblznbits