Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the differences between R's new native pipe `|>` and the magrittr pipe `%>%`?

Tags:

r

pipe

magrittr

In R 4.1 a native pipe operator was introduced that is "more streamlined" than previous implementations. I already noticed one difference between the native |> and the magrittr pipe %>%, namely 2 %>% sqrt works but 2 |> sqrt doesn't and has to be written as 2 |> sqrt(). Are there more differences and pitfalls to be aware of when using the new pipe operator?

like image 918
sieste Avatar asked May 21 '21 08:05

sieste


People also ask

What is Magrittr R?

magrittr: A Forward-Pipe Operator for R Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

Why is it called Magrittr?

Originally from the magrittr package, it's now used in many other packages as well. (If you're wondering where the magrittr name came from, it's a reference to Belgian artist Rene Magritte and one of his paintings, The Treachery of Images, that says in French: “This is not a pipe.”)

What is dplyr Magrittr?

The dplyr package introduced the %. % operator to pass the left hand side as an argument of the function on the right hand side, similar to a *NIX pipe. The magrittr package is a much more lightweight package that exists to define only that pipe-like operator.

What R package has pipe?

What does the pipe do? The pipe operator, written as %>% , has been a longstanding feature of the magrittr package for R. It takes the output of one function and passes it into another function as an argument. This allows us to link a sequence of analysis steps.


5 Answers

In R 4.1, there was no placeholder syntax for the native pipe. Thus, there was no equivalent of the . placeholder of magrittr and thus the following was impossible with |>.

c("dogs", "cats", "rats") %>% grepl("at", .)
#[1] FALSE  TRUE  TRUE

As of R 4.2, the native pipe can use _ as a placeholder but only with named arguments.

c("dogs", "cats", "rats") |> grepl("at", x = _)
#[1] FALSE  TRUE  TRUE

The . and magrittr is still more flexible as . can be repeated and appear in expressions.

c("dogs", "cats", "rats") %>% 
  paste(., ., toupper(.)) 
#[1] "dogs dogs DOGS" "cats cats CATS" "rats rats RATS"

c("dogs", "cats", "rats") |>
  paste(x = "no", y = _) 
# Error in paste(x = "_", y = "_") : pipe placeholder may only appear once

It is also not clear how to use |> with a function that takes in unnamed variadic arguments (i.e., ...). In this paste() example, we can make up x and y arguments to trick the placeholder in the correct place, but that feels hacky.

c("dogs", "cats", "rats") |>
  paste(x = "no", y = _) 
#[1] "no dogs" "no cats" "no rats"

Here are additional ways to work around the place holder limitations-

  1. Write a separate function
find_at = function(x) grepl("at", x)
c("dogs", "cats", "rats") |> find_at()
#[1] FALSE  TRUE  TRUE
  1. Use an anonymous function

    a) Use the "old" syntax

    c("dogs", "cats", "rats") |> {function(x) grepl("at", x)}()
    

    b) Use the new anonymous function syntax

    c("dogs", "cats", "rats") |> {\(x) grepl("at", x)}()
    
  2. Specify the first parameter by name. This relies on the fact that the native pipe pipes into the first unnamed parameter, so if you provide a name for the first parameter it "overflows" into the second (and so on if you specify more than one parameter by name)

c("dogs", "cats", "rats") |> grepl(pattern="at")
#> [1] FALSE  TRUE  TRUE
  • Examples 1 and 2 taken from - https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/
  • Example 3 taken from https://mobile.twitter.com/rlangtip/status/1409904500157161477
like image 146
Ronak Shah Avatar answered Oct 21 '22 10:10

Ronak Shah


The base R pipe |> added in R 4.1.0 "just" does functional composition. I.e. we can see that its use really is just the same as the functional call:

> 1:5 |> sum()             # simple use of |>
[1] 15
> deparse(substitute( 1:5 |> sum() ))
[1] "sum(1:5)"
> 

That has some consequences:

  • it makes it a little faster
  • it makes it a little simpler and more robust
  • it makes it a little more restrictive: sum() here needs the parens for a proper call
  • it limits uses of the 'implicit' data argument

This leads to possible use of => which is currently "available but not active" (for which you need to set the enviornment variable _R_USE_PIPEBIND_, and which may change for R 4.2.0).

(This was first offered as answer to a question duplicating this over here and I just copied it over as suggested.)

Edit: As the follow-up question on 'what is =>' comes up, here is a quick follow-up. Note that this operator is subject to change.

> Sys.setenv("_R_USE_PIPEBIND_"=TRUE)
> mtcars |> subset(cyl == 4) |> d => lm(mpg ~ disp, data = d)

Call:
lm(formula = mpg ~ disp, data = subset(mtcars, cyl == 4))

Coefficients:
(Intercept)         disp  
     40.872       -0.135  

> deparse(substitute(mtcars |> subset(cyl==4) |> d => lm(mpg ~ disp, data = d)))
[1] "lm(mpg ~ disp, data = subset(mtcars, cyl == 4))"
> 

The deparse(substitute(...)) is particularly nice here.

like image 36
Dirk Eddelbuettel Avatar answered Oct 21 '22 12:10

Dirk Eddelbuettel


The native pipe is implemented as a syntax transformation and so 2 |> sqrt() has no discernible overhead compared to sqrt(2), whereas 2 %>% sqrt() comes with a small penalty.

microbenchmark::microbenchmark(
  sqrt(1), 
  2 |> sqrt(), 
  3 %>% sqrt()
)

# Unit: nanoseconds
#          expr  min     lq    mean median   uq   max neval
#       sqrt(1)  117  126.5  141.66  132.0  139   246   100
#       sqrt(2)  118  129.0  156.16  134.0  145  1792   100
#  3 %>% sqrt() 2695 2762.5 2945.26 2811.5 2855 13736   100

You see how the expression 2 |> sqrt() passed to microbenchmark is parsed as sqrt(2). This can also be seen in

quote(2 |> sqrt())
# sqrt(2)
like image 30
sieste Avatar answered Oct 21 '22 11:10

sieste


Topic Magrittr 2.0.3 Base 4.2.0
Operator %>% |>
Function call 1:3 %>% sum() 1:3 |> sum()
  1:3 %>% sum Needs brackets
  mtcars %>% `$`(cyl) Some functions are not supported
Insert on first empty place mtcars %>% lm(formula = mpg ~ disp) mtcars |> lm(formula = mpg ~ disp)
Placeholder . _
  mtcars %>% lm(mpg ~ disp, data = . ) mtcars |> lm(mpg ~ disp, data = _ )
  mtcars %>% lm(mpg ~ disp, . ) Needs named argument
  1:3 %>% setNames(., .) Can only appear once
  1:3 %>% {sum(sqrt(.))} Nested calls are not allowed
  mtcars %>% .$cyl Invalid use of pipe placeholder
but in this case: mtcars$cyl
Environment Additional function environement "x" |> assign(1)
Speed Slower because Overhead of function call Faster because Syntax transformation

Many differences and limitations disappear when using |> in combination with an (anonymous) function:
1 |> (\(.) .)()
-3:3 |> (\(.) sum(2*abs(.) - 3*.^2))()


Needs brackets

library(magrittr)

1:3 |> sum
#Error: The pipe operator requires a function call as RHS

1:3 |> sum()
#[1] 6

1:3 %>% sum
#[1] 6

1:3 %>% sum()
#[1] 6

Some functions are not supported, but some still can be called by placing them in brackets, call them via the function ::, call it in a function or define a link to the function.

mtcars |> `$`(cyl)
#Error: function '$' not supported in RHS call of a pipe

mtcars |> (`$`)(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

mtcars |> base::`$`(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

mtcars |> (\(.) .$cyl)()
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

fun <- `$`
mtcars |> fun(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

mtcars %>% `$`(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

Placeholder needs named argument

2 |> setdiff(1:3, _)
#Error: pipe placeholder can only be used as a named argument

2 |> setdiff(1:3, y = _)
#[1] 1 3

2 |> (\(.) setdiff(1:3, .))()
#[1] 1 3

2 %>% setdiff(1:3, .)
#[1] 1 3

2 %>% setdiff(1:3, y = .)
#[1] 1 3

Placeholder can only appear once

1:3 |> setNames(object = _, nm = _)
#Error in setNames(object = "_", nm = "_") : 
#  pipe placeholder may only appear once

1:3 |> (\(.) setNames(., .))()
#1 2 3 
#1 2 3 

1:3 |> list() |> setNames(".") |> with(setNames(., .))
#1 2 3 
#1 2 3 

1:3 %>% setNames(object = ., nm = .)
#1 2 3
#1 2 3

1:3 %>% setNames(., .)
#1 2 3 
#1 2 3

Nested calls are not allowed

1:3 |> sum(sqrt(x=_))
#Error in sum(1:3, sqrt(x = "_")) : invalid use of pipe placeholder

1:3 |> (\(.) sum(sqrt(.)))()
#[1] 4.146264

1:3 %>% {sum(sqrt(.))}
#[1] 4.146264

No additional Environment

assign("x", 1)
x
#[1] 1

"x" |> assign(2)
x
#[1] 2

"x" |> (\(x) assign(x, 3))()
x
#[1] 2

"x" %>% assign(4)
x
#[1] 2

Other possibilities:
A different pipe operator and different placeholder could be realized with the Bizarro pipe ->.; what is not a pipe (see disadvantages) which is overwriting .

1:3 ->.; sum(.)
#[1] 6

mtcars ->.; .$cyl
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

1:3 ->.; setNames(., .)
#1 2 3 
#1 2 3 

1:3 ->.; sum(sqrt(x=.))
#[1] 4.146264

"x" ->.; assign(., 5)
x
#[1] 5

and evaluates different.

x <- data.frame(a=0)
f1 <- \(x) {message("IN 1"); x$b <- 1; message("OUT 1"); x}
f2 <- \(x) {message("IN 2"); x$c <- 2; message("OUT 2"); x}

x ->.; f1(.) ->.; f2(.)
#IN 1
#OUT 1
#IN 2
#OUT 2
#  a b c
#1 0 1 2

x |> f1() |> f2()
#IN 2
#IN 1
#OUT 1
#OUT 2
#  a b c
#1 0 1 2

f2(f1(x))
#IN 2
#IN 1
#OUT 1
#OUT 2
#  a b c
#1 0 1 2

Or define an own operator, which evaluates different.

":=" <- function(lhs, rhs) {
  e <- exists(".", parent.frame(), inherits = FALSE)
  . <- get0(".", envir = parent.frame(), inherits = FALSE)
  assign(".", lhs, envir=parent.frame())
  on.exit(if(identical(lhs, get0(".", envir = parent.frame(), inherits = FALSE))) {
            if(e) {
              assign(".", ., envir=parent.frame())
            } else {
              if(exists(".", parent.frame())) rm(., envir = parent.frame())
            }
          })
  eval(substitute(rhs), parent.frame())
}

. <- 0
"." := assign(., 1)
.
#[1] 1

1:3 := sum(.)
#[1] 6
.
#[1] 1

mtcars := .$cyl
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

1:3 := setNames(., .)
#1 2 3 
#1 2 3 

1:3 := sum(sqrt(x=.))
#[1] 4.146264

"x" := assign(., 6)
x
#[1] 6

1 := .+1 := .+2
#[1] 4

x <- data.frame(a=0)
x := f1(.) := f2(.)
#IN 1
#OUT 1
#IN 2
#OUT 2
#  a b c
#1 0 1 2

Speed

library(magrittr)

":=" <- function(lhs, rhs) {
  e <- exists(".", parent.frame(), inherits = FALSE)
  . <- get0(".", envir = parent.frame(), inherits = FALSE)
  assign(".", lhs, envir=parent.frame())
  on.exit(if(identical(lhs, get0(".", envir = parent.frame(), inherits = FALSE))) {
            if(e) {
              assign(".", ., envir=parent.frame())
            } else {
              if(exists(".", parent.frame())) rm(., envir = parent.frame())
            }
          })
  eval(substitute(rhs), parent.frame())
}

`%|%` <- function(lhs, rhs) {  #Overwrite and keep .
    assign(".", lhs, envir=parent.frame())
    eval(substitute(rhs), parent.frame())
}

x <- 42
bench::mark(min_time = 0.2, max_iterations = 1e8
, x
, identity(x)
, "|>" = x |> identity()
, "|> _" = x |> identity(x=_)
, "|> f()" = x |> (\(y) identity(y))()
, "%>%" = x %>% identity
, "->.;" = {x ->.; identity(.)}
, ":=" = x := identity(.)
, "%|%" = x %|% identity(.)
, "list." = x |> list() |> setNames(".") |> with(identity(.))
)

Result

#   expression       min   median `itr/sec` mem_alloc `gc/sec`   n_itr  n_gc
#   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>   <int> <dbl>
# 1 x             9.89ns  10.94ns 66611556.        0B     11.7 5708404     1
# 2 identity(x) 179.98ns 200.12ns  4272195.        0B     49.6  603146     7
# 3 |>          179.98ns 201.05ns  4238021.        0B     41.1  722534     7
# 4 |> _        189.87ns 219.91ns  4067314.        0B     39.4  722803     7
# 5 |> f()      410.01ns 451.11ns  1889295.        0B     44.6  339126     8
# 6 %>%           1.27µs   1.39µs   632255.    5.15KB     43.2  117210     8
# 7 ->.;        289.87ns 330.97ns  2581693.        0B     27.0  477389     5
# 8 :=            6.46µs   7.12µs   131921.        0B     48.8   24330     9
# 9 %|%           2.05µs   2.32µs   394515.        0B     43.2   73094     8
#10 list.         2.42µs   2.74µs   340220.     8.3KB     42.3   64324     8
like image 24
GKi Avatar answered Oct 21 '22 12:10

GKi


One difference is their placeholder, _ in base R, . in magrittr.


Since R 4.2.0, the base R pipe has a placeholder for piped-in values, _, similar to %>%'s ., but its use is restricted to named arguments, and can only be used once per call.

It is now possible to use a named argument with the placeholder _ in the rhs call to specify where the lhs is to be inserted. The placeholder can only appear once on the rhs.

To reiterate Ronak Shah's example, you can now use _ as a named argument on the right-hand side to refer to the left-hand side of the formula:

c("dogs", "cats", "rats") |> 
    grepl("at", x = _)
#[1] FALSE  TRUE  TRUE

but it has to be named:

c("dogs", "cats", "rats") |> 
    grepl("at", _)
#Error: pipe placeholder can only be used as a named argument

and cannot appear more than once (to overcome this issue, one can still use the solutions provided by Ronak Shah):

c("dogs", "cats", "rats") |> 
  expand.grid(x = _, y = _)
# Error in expand.grid(x = "_", y = "_") : pipe placeholder may only appear once

While this is possible with magrittr:

library(magrittr)
c("dogs", "cats", "rats") %>% 
  expand.grid(x = ., y = .)
#     x    y
#1 dogs dogs
#2 cats dogs
#3 rats dogs
#4 dogs cats
#5 cats cats
#6 rats cats
#7 dogs rats
#8 cats rats
#9 rats rats
like image 13
Maël Avatar answered Oct 21 '22 10:10

Maël