Can somebody explain to me why the two following instructions have different outputs:
library(plyr)
library(dplyr)
ll <- list(a = mtcars, b = mtcars)
# using '.' as a function parameter
llply(ll, function(.) . %>% group_by(cyl) %>% summarise(min = min(mpg)))
# using 'd' as function parameter
llply(ll, function(d) d %>% group_by(cyl) %>% summarise(min = min(mpg)))
The former case is apparently not even evaluated (which I figured by misspelling summarise
: llply(ll, function(.) . %>% group_by(cyl) %>% sumamrise(min = min(mpg)))
would not throw an error).
So this has all to do with scoping rules and where things are evaluated, but I really want to understand what is going on, and why this happens? I use .
as an argument in anonymous functions quite often and I was puzzled to see the outcome.
So long story short, why does .
not work with %>%
?
An anonymous function is a function with no name which can be used once they're created. The anonymous function can be used in passing as a parameter to another function or in the immediate execution of a function.
Anonymous functions, also known as closures , allow the creation of functions which have no specified name. They are most useful as the value of callable parameters, but they have many other uses. Anonymous functions are implemented using the Closure class.
In Python, an anonymous function is a function that is defined without a name. While normal functions are defined using the def keyword in Python, anonymous functions are defined using the lambda keyword. Hence, anonymous functions are also called lambda functions.
An anonymous function is a function that is not stored in a program file, but is associated with a variable whose data type is function_handle . Anonymous functions can accept multiple inputs and return one output. They can contain only a single executable statement.
This seems to be because of the special use of .
as a placeholder when using piping. From ?"%>%"
:
Using the dot for secondary purposes
Often, some attribute or property of lhs is desired in the rhs call in addition to the value of lhs itself, e.g. the number of rows or columns. It is perfectly valid to use the dot placeholder several times in the rhs call, but by design the behavior is slightly different when using it inside nested function calls. In particular, if the placeholder is only used in a nested function call, lhs will also be placed as the first argument! The reason for this is that in most use-cases this produces the most readable code. For example, iris %>% subset(1:nrow(.) %% 2 == 0) is equivalent to iris %>% subset(., 1:nrow(.) %% 2 == 0) but slightly more compact. It is possible to overrule this behavior by enclosing the rhs in braces. For example, 1:10 %>% {c(min(.), max(.))} is equivalent to c(min(1:10), max(1:10)).
The .
("the dot") has multiple uses, one of which is indeed as an argument. How it's actually interpreted is highly dependent on its context -- and in your context, it's used immediately before a %>%
forward-pipe operator. dplyr
takes its forward-pipe operator from magrittr
, and from the magrittr
documentation we have the following snippet on what happens when there's a . %>% somefunction()
:
When the dot is used as lhs, the result will be a functional sequence, i.e. a function which applies the entire chain of right-hand sides in turn to its input.
So it's almost like an order of operations thing - a %>%
immediately after the dot would interpret the dot as a part of the functional sequence.
One way to get your .
understood as an argument instead is to add brackets around it, i.e.
llply(ll, function(.) (.) %>% group_by(cyl) %>% summarise(min = min(mpg)))
For a more thorough explanation of the different uses of .
and %>%
, and their interaction with each other, have a look at https://cran.r-project.org/web/packages/magrittr/magrittr.pdf. The relevant section starts from page 8.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With