In R 4.1 a native pipe operator was introduced that is "more streamlined" than previous implementations. I already noticed one difference between the native |>
and the magrittr pipe %>%
, namely 2 %>% sqrt
works but 2 |> sqrt
doesn't and has to be written as 2 |> sqrt()
. Are there more differences and pitfalls to be aware of when using the new pipe operator?
magrittr: A Forward-Pipe Operator for R Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
Originally from the magrittr package, it's now used in many other packages as well. (If you're wondering where the magrittr name came from, it's a reference to Belgian artist Rene Magritte and one of his paintings, The Treachery of Images, that says in French: “This is not a pipe.”)
The dplyr package introduced the %. % operator to pass the left hand side as an argument of the function on the right hand side, similar to a *NIX pipe. The magrittr package is a much more lightweight package that exists to define only that pipe-like operator.
What does the pipe do? The pipe operator, written as %>% , has been a longstanding feature of the magrittr package for R. It takes the output of one function and passes it into another function as an argument. This allows us to link a sequence of analysis steps.
In R 4.1, there was no placeholder syntax for the native pipe. Thus, there was no equivalent of the .
placeholder of magrittr and thus the following was impossible with |>
.
c("dogs", "cats", "rats") %>% grepl("at", .)
#[1] FALSE TRUE TRUE
As of R 4.2, the native pipe can use _
as a placeholder but only with named arguments.
c("dogs", "cats", "rats") |> grepl("at", x = _)
#[1] FALSE TRUE TRUE
The .
and magrittr is still more flexible as .
can be repeated and appear in expressions.
c("dogs", "cats", "rats") %>%
paste(., ., toupper(.))
#[1] "dogs dogs DOGS" "cats cats CATS" "rats rats RATS"
c("dogs", "cats", "rats") |>
paste(x = "no", y = _)
# Error in paste(x = "_", y = "_") : pipe placeholder may only appear once
It is also not clear how to use |>
with a function that takes in unnamed variadic arguments (i.e., ...
). In this paste()
example, we can make up x
and y
arguments to trick the placeholder in the correct place, but that feels hacky.
c("dogs", "cats", "rats") |>
paste(x = "no", y = _)
#[1] "no dogs" "no cats" "no rats"
Here are additional ways to work around the place holder limitations-
find_at = function(x) grepl("at", x)
c("dogs", "cats", "rats") |> find_at()
#[1] FALSE TRUE TRUE
Use an anonymous function
a) Use the "old" syntax
c("dogs", "cats", "rats") |> {function(x) grepl("at", x)}()
b) Use the new anonymous function syntax
c("dogs", "cats", "rats") |> {\(x) grepl("at", x)}()
Specify the first parameter by name. This relies on the fact that the native pipe pipes into the first unnamed parameter, so if you provide a name for the first parameter it "overflows" into the second (and so on if you specify more than one parameter by name)
c("dogs", "cats", "rats") |> grepl(pattern="at")
#> [1] FALSE TRUE TRUE
The base R pipe |>
added in R 4.1.0 "just" does functional composition. I.e. we can see that its use really is just the same as the functional call:
> 1:5 |> sum() # simple use of |>
[1] 15
> deparse(substitute( 1:5 |> sum() ))
[1] "sum(1:5)"
>
That has some consequences:
sum()
here needs the parens for a proper callThis leads to possible use of =>
which is currently "available but not active" (for which you need to set the enviornment variable _R_USE_PIPEBIND_
, and which may change for R 4.2.0).
(This was first offered as answer to a question duplicating this over here and I just copied it over as suggested.)
Edit: As the follow-up question on 'what is =>
' comes up, here is a quick follow-up. Note that this operator is subject to change.
> Sys.setenv("_R_USE_PIPEBIND_"=TRUE)
> mtcars |> subset(cyl == 4) |> d => lm(mpg ~ disp, data = d)
Call:
lm(formula = mpg ~ disp, data = subset(mtcars, cyl == 4))
Coefficients:
(Intercept) disp
40.872 -0.135
> deparse(substitute(mtcars |> subset(cyl==4) |> d => lm(mpg ~ disp, data = d)))
[1] "lm(mpg ~ disp, data = subset(mtcars, cyl == 4))"
>
The deparse(substitute(...))
is particularly nice here.
The native pipe is implemented as a syntax transformation and so 2 |> sqrt()
has no discernible overhead compared to sqrt(2)
, whereas 2 %>% sqrt()
comes with a small penalty.
microbenchmark::microbenchmark(
sqrt(1),
2 |> sqrt(),
3 %>% sqrt()
)
# Unit: nanoseconds
# expr min lq mean median uq max neval
# sqrt(1) 117 126.5 141.66 132.0 139 246 100
# sqrt(2) 118 129.0 156.16 134.0 145 1792 100
# 3 %>% sqrt() 2695 2762.5 2945.26 2811.5 2855 13736 100
You see how the expression 2 |> sqrt()
passed to microbenchmark
is parsed as sqrt(2)
. This can also be seen in
quote(2 |> sqrt())
# sqrt(2)
Topic | Magrittr 2.0.3 | Base 4.2.0 |
---|---|---|
Operator | %>% |
|> |
Function call | 1:3 %>% sum() |
1:3 |> sum() |
1:3 %>% sum |
Needs brackets | |
mtcars %>% `$`(cyl) |
Some functions are not supported | |
Insert on first empty place | mtcars %>% lm(formula = mpg ~ disp) |
mtcars |> lm(formula = mpg ~ disp) |
Placeholder | . |
_ |
mtcars %>% lm(mpg ~ disp, data = . ) |
mtcars |> lm(mpg ~ disp, data = _ ) |
|
mtcars %>% lm(mpg ~ disp, . ) |
Needs named argument | |
1:3 %>% setNames(., .) |
Can only appear once | |
1:3 %>% {sum(sqrt(.))} |
Nested calls are not allowed | |
mtcars %>% .$cyl |
Invalid use of pipe placeholder but in this case: mtcars$cyl
|
|
Environment | Additional function environement | "x" |> assign(1) |
Speed | Slower because Overhead of function call | Faster because Syntax transformation |
Many differences and limitations disappear when using |>
in combination with an (anonymous) function:1 |> (\(.) .)()
-3:3 |> (\(.) sum(2*abs(.) - 3*.^2))()
Needs brackets
library(magrittr)
1:3 |> sum
#Error: The pipe operator requires a function call as RHS
1:3 |> sum()
#[1] 6
1:3 %>% sum
#[1] 6
1:3 %>% sum()
#[1] 6
Some functions are not supported,
but some still can be called by placing them in brackets, call them via the function ::
, call it in a function or define a link to the function.
mtcars |> `$`(cyl)
#Error: function '$' not supported in RHS call of a pipe
mtcars |> (`$`)(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
mtcars |> base::`$`(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
mtcars |> (\(.) .$cyl)()
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
fun <- `$`
mtcars |> fun(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
mtcars %>% `$`(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
Placeholder needs named argument
2 |> setdiff(1:3, _)
#Error: pipe placeholder can only be used as a named argument
2 |> setdiff(1:3, y = _)
#[1] 1 3
2 |> (\(.) setdiff(1:3, .))()
#[1] 1 3
2 %>% setdiff(1:3, .)
#[1] 1 3
2 %>% setdiff(1:3, y = .)
#[1] 1 3
Placeholder can only appear once
1:3 |> setNames(object = _, nm = _)
#Error in setNames(object = "_", nm = "_") :
# pipe placeholder may only appear once
1:3 |> (\(.) setNames(., .))()
#1 2 3
#1 2 3
1:3 |> list() |> setNames(".") |> with(setNames(., .))
#1 2 3
#1 2 3
1:3 %>% setNames(object = ., nm = .)
#1 2 3
#1 2 3
1:3 %>% setNames(., .)
#1 2 3
#1 2 3
Nested calls are not allowed
1:3 |> sum(sqrt(x=_))
#Error in sum(1:3, sqrt(x = "_")) : invalid use of pipe placeholder
1:3 |> (\(.) sum(sqrt(.)))()
#[1] 4.146264
1:3 %>% {sum(sqrt(.))}
#[1] 4.146264
No additional Environment
assign("x", 1)
x
#[1] 1
"x" |> assign(2)
x
#[1] 2
"x" |> (\(x) assign(x, 3))()
x
#[1] 2
"x" %>% assign(4)
x
#[1] 2
Other possibilities:
A different pipe operator and different placeholder could be realized with the Bizarro pipe ->.;
what is not a pipe (see disadvantages) which is overwriting .
1:3 ->.; sum(.)
#[1] 6
mtcars ->.; .$cyl
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
1:3 ->.; setNames(., .)
#1 2 3
#1 2 3
1:3 ->.; sum(sqrt(x=.))
#[1] 4.146264
"x" ->.; assign(., 5)
x
#[1] 5
and evaluates different.
x <- data.frame(a=0)
f1 <- \(x) {message("IN 1"); x$b <- 1; message("OUT 1"); x}
f2 <- \(x) {message("IN 2"); x$c <- 2; message("OUT 2"); x}
x ->.; f1(.) ->.; f2(.)
#IN 1
#OUT 1
#IN 2
#OUT 2
# a b c
#1 0 1 2
x |> f1() |> f2()
#IN 2
#IN 1
#OUT 1
#OUT 2
# a b c
#1 0 1 2
f2(f1(x))
#IN 2
#IN 1
#OUT 1
#OUT 2
# a b c
#1 0 1 2
Or define an own operator, which evaluates different.
":=" <- function(lhs, rhs) {
e <- exists(".", parent.frame(), inherits = FALSE)
. <- get0(".", envir = parent.frame(), inherits = FALSE)
assign(".", lhs, envir=parent.frame())
on.exit(if(identical(lhs, get0(".", envir = parent.frame(), inherits = FALSE))) {
if(e) {
assign(".", ., envir=parent.frame())
} else {
if(exists(".", parent.frame())) rm(., envir = parent.frame())
}
})
eval(substitute(rhs), parent.frame())
}
. <- 0
"." := assign(., 1)
.
#[1] 1
1:3 := sum(.)
#[1] 6
.
#[1] 1
mtcars := .$cyl
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
1:3 := setNames(., .)
#1 2 3
#1 2 3
1:3 := sum(sqrt(x=.))
#[1] 4.146264
"x" := assign(., 6)
x
#[1] 6
1 := .+1 := .+2
#[1] 4
x <- data.frame(a=0)
x := f1(.) := f2(.)
#IN 1
#OUT 1
#IN 2
#OUT 2
# a b c
#1 0 1 2
Speed
library(magrittr)
":=" <- function(lhs, rhs) {
e <- exists(".", parent.frame(), inherits = FALSE)
. <- get0(".", envir = parent.frame(), inherits = FALSE)
assign(".", lhs, envir=parent.frame())
on.exit(if(identical(lhs, get0(".", envir = parent.frame(), inherits = FALSE))) {
if(e) {
assign(".", ., envir=parent.frame())
} else {
if(exists(".", parent.frame())) rm(., envir = parent.frame())
}
})
eval(substitute(rhs), parent.frame())
}
`%|%` <- function(lhs, rhs) { #Overwrite and keep .
assign(".", lhs, envir=parent.frame())
eval(substitute(rhs), parent.frame())
}
x <- 42
bench::mark(min_time = 0.2, max_iterations = 1e8
, x
, identity(x)
, "|>" = x |> identity()
, "|> _" = x |> identity(x=_)
, "|> f()" = x |> (\(y) identity(y))()
, "%>%" = x %>% identity
, "->.;" = {x ->.; identity(.)}
, ":=" = x := identity(.)
, "%|%" = x %|% identity(.)
, "list." = x |> list() |> setNames(".") |> with(identity(.))
)
Result
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
# 1 x 9.89ns 10.94ns 66611556. 0B 11.7 5708404 1
# 2 identity(x) 179.98ns 200.12ns 4272195. 0B 49.6 603146 7
# 3 |> 179.98ns 201.05ns 4238021. 0B 41.1 722534 7
# 4 |> _ 189.87ns 219.91ns 4067314. 0B 39.4 722803 7
# 5 |> f() 410.01ns 451.11ns 1889295. 0B 44.6 339126 8
# 6 %>% 1.27µs 1.39µs 632255. 5.15KB 43.2 117210 8
# 7 ->.; 289.87ns 330.97ns 2581693. 0B 27.0 477389 5
# 8 := 6.46µs 7.12µs 131921. 0B 48.8 24330 9
# 9 %|% 2.05µs 2.32µs 394515. 0B 43.2 73094 8
#10 list. 2.42µs 2.74µs 340220. 8.3KB 42.3 64324 8
One difference is their placeholder, _
in base R, .
in magrittr
.
Since R 4.2.0, the base R pipe has a placeholder for piped-in values, _
, similar to %>%
's .
, but its use is restricted to named arguments, and can only be used once per call.
It is now possible to use a named argument with the placeholder _ in the rhs call to specify where the lhs is to be inserted. The placeholder can only appear once on the rhs.
To reiterate Ronak Shah's example, you can now use _
as a named argument on the right-hand side to refer to the left-hand side of the formula:
c("dogs", "cats", "rats") |>
grepl("at", x = _)
#[1] FALSE TRUE TRUE
but it has to be named:
c("dogs", "cats", "rats") |>
grepl("at", _)
#Error: pipe placeholder can only be used as a named argument
and cannot appear more than once (to overcome this issue, one can still use the solutions provided by Ronak Shah):
c("dogs", "cats", "rats") |>
expand.grid(x = _, y = _)
# Error in expand.grid(x = "_", y = "_") : pipe placeholder may only appear once
While this is possible with magrittr
:
library(magrittr)
c("dogs", "cats", "rats") %>%
expand.grid(x = ., y = .)
# x y
#1 dogs dogs
#2 cats dogs
#3 rats dogs
#4 dogs cats
#5 cats cats
#6 rats cats
#7 dogs rats
#8 cats rats
#9 rats rats
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With