Are there cases where it is not advantageous to use the magrittr pipe inside of R functions from the perspectives of (1) speed, and (2) ability to debug effectively?

There are advantages and disadvantages to using a pipe inside of a function. The biggest advantage is that it's easier to see what's happening within a function when you read the code. The biggest downsides are that error messages become harder to interpret and the pipe breaks some of R's rules of evaluation. Here's an example. Let's say we want to make a pointless transformation to the <code>mtcars</code> dataset. Here's how we could do that with pipes... <pre class="prettyprint"><code>library(tidyverse) tidy_function <- function() { mtcars %>% group_by(cyl) %>% summarise(disp = sum(disp)) %>% mutate(disp = (disp ^ 4) / 10000000000) } </code></pre> You can clearly see what's happening at every stage, even though it's not doing anything useful. Now let's look at the time code using the Dagwood Sandwich approach... <pre class="prettyprint"><code>base_function <- function() { mutate(summarise(group_by(mtcars, cyl), disp = sum(disp)), disp = (disp^5) / 10000000000) } </code></pre> Much harder to read, even though it gives us the same result... <pre class="prettyprint"><code>all.equal(tidy_function(), base_function()) # [1] TRUE </code></pre> The most common way to avoid using either a pipe or a Dagwood Sandwich is to save the results of each step to an intermediate variable... <pre class="prettyprint"><code>intermediate_function <- function() { x <- mtcars x <- group_by(x, cyl) x <- summarise(x, disp = sum(disp)) mutate(x, disp = (disp^5) / 10000000000) } </code></pre> More readable than the last function and R will give you a little more detailed information when there's an error. Plus it obeys the traditional rules of evaluation. Again, it gives the same results as the other two functions... <pre class="prettyprint"><code>all.equal(tidy_function(), intermediate_function()) # [1] TRUE </code></pre> You specifically asked about speed, so let's compare these three functions by running each of them 1000 times... <pre class="prettyprint"><code>library(microbenchmark) timing <- microbenchmark(tidy_function(), intermediate_function(), base_function(), times = 1000L) timing #Unit: milliseconds #expr min lq mean median uq max neval cld #tidy_function() 3.809009 4.403243 5.531429 4.800918 5.860111 23.37589 1000 a #intermediate_function() 3.560666 4.106216 5.154006 4.519938 5.538834 21.43292 1000 a #base_function() 3.610992 4.136850 5.519869 4.583573 5.696737 203.66175 1000 a </code></pre> Even in this trivial example, the pipe is a tiny bit slower than the other two options. <h3>Conclusion</h3> Feel free to use the pipe in your functions if it's the most comfortable way for you to write code. If you start running into problems or if you need your code to be as fast as humanly possible, then switch to a different paradigm.

Magrittr pipe in R functions

1 Answers

There are advantages and disadvantages to using a pipe inside of a function. The biggest advantage is that it's easier to see what's happening within a function when you read the code. The biggest downsides are that error messages become harder to interpret and the pipe breaks some of R's rules of evaluation.

Here's an example. Let's say we want to make a pointless transformation to the mtcars dataset. Here's how we could do that with pipes...

library(tidyverse)
tidy_function <- function() {
  mtcars %>%
    group_by(cyl) %>%
    summarise(disp = sum(disp)) %>%
    mutate(disp = (disp ^ 4) / 10000000000)
}

You can clearly see what's happening at every stage, even though it's not doing anything useful. Now let's look at the time code using the Dagwood Sandwich approach...

base_function <- function() {
  mutate(summarise(group_by(mtcars, cyl), disp = sum(disp)), disp = (disp^5) / 10000000000)
}

Much harder to read, even though it gives us the same result...

all.equal(tidy_function(), base_function())
# [1] TRUE

The most common way to avoid using either a pipe or a Dagwood Sandwich is to save the results of each step to an intermediate variable...

intermediate_function <- function() {
  x <- mtcars
  x <- group_by(x, cyl)
  x <- summarise(x, disp = sum(disp))
  mutate(x, disp = (disp^5) / 10000000000)
}

More readable than the last function and R will give you a little more detailed information when there's an error. Plus it obeys the traditional rules of evaluation. Again, it gives the same results as the other two functions...

all.equal(tidy_function(), intermediate_function())
# [1] TRUE

You specifically asked about speed, so let's compare these three functions by running each of them 1000 times...

library(microbenchmark)
timing <-
  microbenchmark(tidy_function(),
                 intermediate_function(),
                 base_function(),
                 times = 1000L)
timing
#Unit: milliseconds
                    #expr      min       lq     mean   median       uq       max neval cld
         #tidy_function() 3.809009 4.403243 5.531429 4.800918 5.860111  23.37589  1000   a
 #intermediate_function() 3.560666 4.106216 5.154006 4.519938 5.538834  21.43292  1000   a
         #base_function() 3.610992 4.136850 5.519869 4.583573 5.696737 203.66175  1000   a

Even in this trivial example, the pipe is a tiny bit slower than the other two options.

Conclusion

Feel free to use the pipe in your functions if it's the most comfortable way for you to write code. If you start running into problems or if you need your code to be as fast as humanly possible, then switch to a different paradigm.

100

answered Sep 20 '22 17:09

Andrew Brēza

Related questions
                            
                                ggplot2 - changing numeric axis title to vector of strings
                            
                                Plot a table with box size changing
                            
                                Rmarkdown overlapping output
                            
                                Is there a limit to the string length that can be passed to grep() in R?
                            
                                Determining High Density Region for a distribution in R
                            
                                Add new column with name of max column in data frame
                            
                                Shiny doesn't display R plotly plot
                            
                                ggplot2: merge two legends
                            
                                How do I use the ebook functions epub_book and kindlegen() for existing bookdown documents?
                            
                                R dynamic data frame names in Loop
                            
                                DT showing more rows in DT
                            
                                Include .csv filename when reading data into r using list.files
                            
                                R - How to replicate rows in a spark dataframe using sparklyr
                            
                                R lapply(): Change all columns within all data frames in a list to numeric, then convert all values to percentages
                            
                                Date format in hover for ggplot2 and plotly
                            
                                Adding Independent Variables to Prophet Package
                            
                                Why does the R have two libraries by default?
                            
                                Combining SpatialPointsDataFrame with SpatialPolygonsDataFrame error: maximum returned dense matrix size exceeded
                            
                                How to capture shinyalert input field as variable
                            
                                non-reproducible R package availability check

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Magrittr pipe in R functions

Tags:

r

magrittr

pipeline

user2506086

People also ask

1 Answers

Conclusion

Andrew Brēza

Recent Activity

Donate For Us