Are there any objective reasons for why pipe operators from the R package <code>magrittr</code>, such as <code>%>%</code>, should be avoided when I program packages in R? More specifically, I want to know if using pipe operators might cause coding conflicts or (positively or negatively) affect performance. I am looking for specific, concrete examples of such cases.

Like all advanced functions written in R, <code>%>%</code> carries a lot of overhead, so don't use it in loops (this includes implicit loops, such as the <code>*apply</code> family, or the per group loops in packages like <code>dplyr</code> or <code>data.table</code>). Here's an example: <pre class="prettyprint"><code>library(magrittr) x = 1:10 system.time({for(i in 1:1e5) identity(x)}) # user system elapsed # 0.07 0.00 0.08 system.time({for(i in 1:1e5) x %>% identity}) # user system elapsed # 15.39 0.00 16.68 </code></pre>

The piping paradigm inverts the apparent order of function application in comparison with "standard functional programming". Whether this has adverse consequences depends on the function semiotics (my original mispledding was intended to be 'semantics' but the spielchucker though I meant <code>semiotics</code> and that seemed OK). I happen to think piping creates code that is less readable, but that is because I have trained my brain to look at coding from the "inside-out". Compare: <pre class="prettyprint"><code> y <- func3 ( func2( func1( x) ) ) y <- x %>% func1 %>% func2 %>% func3 </code></pre> To my way of thinking the first one is more readable since the information "flows" outward (and consistently leftward) and ends up in the leftmost position <code>y</code>, where as the information in the second one flows to the right and then "turns around and is sent to the left. The piping paradigm also allow argument-less function application which I think increases the potential for error. R programming with only positional parameter matching often produces error messages that are totally inscrutable, whereas disciplining yourself to always (or almost always) use argument names has the benefit of much more informative error messages. My preference would have been for a piping paradigm that had a consistent direction: <pre class="prettyprint"><code> y <- func3 %<% func2 %<% func1 %<% x # Or x %>% func1 %>% func2 %>% func3 -> y </code></pre> And I think this was actually part of the original design of pkg-<code>magrittr</code> which I believe included a 'left-pipe' as well as a 'right-pipe'. So this is probably a human-factors design issue. R has left to right associativity and the typical user of the dplyr/magrittr piping paradigm generally obeys that rule. I probably have stiff-brain syndrome, and all you young guys are probably the future, so you make your choice. I really do admire Hadley's goal of rationalizing data input and processing so that files and SQL servers are seen as generalized serial devices. The example offered by David Robinson suggests that keeping track of arguments is a big issues and I agree completely. My usual approach is using tabs and spaces to highlight the hierarchy: <pre class="prettyprint"><code>func3 ( func2( func1(x, a), # think we need an extra comma here b, c), # and here d, e, f) x %>% func1(a) %>% func2(b, c) %>% func3(d, e, f) </code></pre> Admittedly this is made easier with a syntax-aware editor when checking for missing commas or parentheses, but in the example above which was not done with one, the stacking/spacing method does highlight what I think was a syntax error. (I also quickly add argument names when having difficulties, but I think that would be equally applicable to piping code tactics.)

Should I avoid programming packages with pipe operators?

3 Answers

Like all advanced functions written in R, %>% carries a lot of overhead, so don't use it in loops (this includes implicit loops, such as the *apply family, or the per group loops in packages like dplyr or data.table). Here's an example:

library(magrittr)
x = 1:10

system.time({for(i in 1:1e5) identity(x)})
#   user  system elapsed 
#   0.07    0.00    0.08 
system.time({for(i in 1:1e5) x %>% identity})
#   user  system elapsed 
#  15.39    0.00   16.68

132

answered Oct 05 '22 19:10

eddi

Adding dependencies to a package shouldn't be taken too lightly. Speaking generally, every package that your package depends on is a risk for future maintenance whenever the dependency updates, or in case the dependency stops being maintained. It also makes it (slightly) harder for people to install your package - though only noticeably so in cases where an internet connection is unreliable or in some cases where some packages are more difficult to install on certain systems or hardware. But if someone wants to put your package on a thumb drive to install somewhere, they will also need to make sure they have all of your dependencies (and the dependencies of your dependencies...).

Base R and the default packages have a long history, and R-Core is very conscious of not introducing changes that will break downstream dependencies. magrittr is much newer, looks like it was first up on CRAN in Feb 2014.

Practically speaking, magrittr has been stable and seems like a low risk dependency. Especially if you are importing just %>% and ignoring the more esoteric operators it provides (as is done by dplyr, tidyr, et al.) you are probably quite safe. Its popularity almost guarantees that even if its creator abandons it, someone will take over the maintenance.

Now in 2022 we've had a couple R releases featuring the base pipe |>, so there's a nice alternative with 0 dependencies as long as you can run R version 4.1.0 or greater.

answered Oct 05 '22 19:10

Gregor Thomas

The piping paradigm inverts the apparent order of function application in comparison with "standard functional programming". Whether this has adverse consequences depends on the function semiotics (my original mispledding was intended to be 'semantics' but the spielchucker though I meant semiotics and that seemed OK). I happen to think piping creates code that is less readable, but that is because I have trained my brain to look at coding from the "inside-out". Compare:

 y <- func3 ( func2( func1( x) ) )

 y <- x %>% func1 %>% func2 %>% func3

To my way of thinking the first one is more readable since the information "flows" outward (and consistently leftward) and ends up in the leftmost position y, where as the information in the second one flows to the right and then "turns around and is sent to the left. The piping paradigm also allow argument-less function application which I think increases the potential for error. R programming with only positional parameter matching often produces error messages that are totally inscrutable, whereas disciplining yourself to always (or almost always) use argument names has the benefit of much more informative error messages.

My preference would have been for a piping paradigm that had a consistent direction:

 y <- func3 %<% func2 %<% func1 %<% x
 # Or
 x %>% func1 %>% func2 %>% func3 -> y

And I think this was actually part of the original design of pkg-magrittr which I believe included a 'left-pipe' as well as a 'right-pipe'. So this is probably a human-factors design issue. R has left to right associativity and the typical user of the dplyr/magrittr piping paradigm generally obeys that rule. I probably have stiff-brain syndrome, and all you young guys are probably the future, so you make your choice. I really do admire Hadley's goal of rationalizing data input and processing so that files and SQL servers are seen as generalized serial devices.

The example offered by David Robinson suggests that keeping track of arguments is a big issues and I agree completely. My usual approach is using tabs and spaces to highlight the hierarchy:

func3 ( func2( 
           func1(x, a),    # think we need an extra comma here
               b, c),       # and here
        d, e, f) 

x %>% func1(a) %>% func2(b, c) %>% func3(d, e, f)

Admittedly this is made easier with a syntax-aware editor when checking for missing commas or parentheses, but in the example above which was not done with one, the stacking/spacing method does highlight what I think was a syntax error. (I also quickly add argument names when having difficulties, but I think that would be equally applicable to piping code tactics.)

answered Oct 05 '22 20:10

IRTFM

Related questions
                            
                                Forecasting time series data
                            
                                Merging multiple rasters in R
                            
                                What is the right way to multiply data frame by vector?
                            
                                How to adjust facet size manually
                            
                                R: How to filter/subset a sequence of dates
                            
                                Delete columns/rows with more than x% missing
                            
                                How to transpose a dataframe in tidyverse?
                            
                                How do I strip dollar signs ($) from data/ escape special characters in R?
                            
                                linear regression "NA" estimate just for last coefficient
                            
                                Is there a way to knitr markdown straight out of your workspace using RStudio?
                            
                                Create new column with dplyr mutate and substring of existing column
                            
                                Change plot title sizes in a facet_wrap multiplot
                            
                                Use filter in dplyr conditional on an if statement in R
                            
                                Saving and loading data.frames [duplicate]
                            
                                How to access to specify file in subfolder without change working directory In R?
                            
                                Install binary zipped R package via command line
                            
                                Check whether two vectors contain the same (unordered) elements in R
                            
                                How to remove duplicated column names in R?
                            
                                Transpose / reshape dataframe without "timevar" from long to wide format
                            
                                Add (subtract) months without exceeding the last day of the new month

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Should I avoid programming packages with pipe operators?

Tags:

r

magrittr

Johan Larsson

People also ask

3 Answers

eddi

Gregor Thomas

IRTFM

Recent Activity

Donate For Us