Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Magrittr pipe in R functions

Are there cases where it is not advantageous to use the magrittr pipe inside of R functions from the perspectives of (1) speed, and (2) ability to debug effectively?

like image 339
user2506086 Avatar asked Jul 17 '17 17:07

user2506086


People also ask

What does Magrittr do in R?

magrittr: A Forward-Pipe Operator for RProvides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

How does the %>% pipe work in R?

What does the pipe do? The pipe operator, written as %>% , has been a longstanding feature of the magrittr package for R. It takes the output of one function and passes it into another function as an argument. This allows us to link a sequence of analysis steps.

What is the function of pipe in R?

Pipes are an extremely useful tool from the magrittr package 1 that allow you to express a sequence of multiple operations. They can greatly simplify your code and make your operations more intuitive. However they are not the only way to write your code and combine multiple operations.

What does %>% mean in Tidyverse?

Use %>% to emphasise a sequence of actions, rather than the object that the actions are being performed on. Avoid using the pipe when: You need to manipulate more than one object at a time. Reserve pipes for a sequence of steps applied to one primary object.


1 Answers

There are advantages and disadvantages to using a pipe inside of a function. The biggest advantage is that it's easier to see what's happening within a function when you read the code. The biggest downsides are that error messages become harder to interpret and the pipe breaks some of R's rules of evaluation.

Here's an example. Let's say we want to make a pointless transformation to the mtcars dataset. Here's how we could do that with pipes...

library(tidyverse)
tidy_function <- function() {
  mtcars %>%
    group_by(cyl) %>%
    summarise(disp = sum(disp)) %>%
    mutate(disp = (disp ^ 4) / 10000000000)
}

You can clearly see what's happening at every stage, even though it's not doing anything useful. Now let's look at the time code using the Dagwood Sandwich approach...

base_function <- function() {
  mutate(summarise(group_by(mtcars, cyl), disp = sum(disp)), disp = (disp^5) / 10000000000)
}

Much harder to read, even though it gives us the same result...

all.equal(tidy_function(), base_function())
# [1] TRUE

The most common way to avoid using either a pipe or a Dagwood Sandwich is to save the results of each step to an intermediate variable...

intermediate_function <- function() {
  x <- mtcars
  x <- group_by(x, cyl)
  x <- summarise(x, disp = sum(disp))
  mutate(x, disp = (disp^5) / 10000000000)
}

More readable than the last function and R will give you a little more detailed information when there's an error. Plus it obeys the traditional rules of evaluation. Again, it gives the same results as the other two functions...

all.equal(tidy_function(), intermediate_function())
# [1] TRUE

You specifically asked about speed, so let's compare these three functions by running each of them 1000 times...

library(microbenchmark)
timing <-
  microbenchmark(tidy_function(),
                 intermediate_function(),
                 base_function(),
                 times = 1000L)
timing
#Unit: milliseconds
                    #expr      min       lq     mean   median       uq       max neval cld
         #tidy_function() 3.809009 4.403243 5.531429 4.800918 5.860111  23.37589  1000   a
 #intermediate_function() 3.560666 4.106216 5.154006 4.519938 5.538834  21.43292  1000   a
         #base_function() 3.610992 4.136850 5.519869 4.583573 5.696737 203.66175  1000   a

Even in this trivial example, the pipe is a tiny bit slower than the other two options.

Conclusion

Feel free to use the pipe in your functions if it's the most comfortable way for you to write code. If you start running into problems or if you need your code to be as fast as humanly possible, then switch to a different paradigm.

like image 100
Andrew Brēza Avatar answered Sep 20 '22 17:09

Andrew Brēza