I have been looking to a lot of answers and still I can't completely understand them. For example, the clearest one (here), among others (1,2,3) gives specific examples about the various uses of the dot but I cannot understand, for example, its application here:
car_data <-
mtcars %>%
subset(hp > 100) %>%
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2)) %>%
transform(kpl = mpg %>% multiply_by(0.4251)) %>%
print
#result:
cyl mpg disp hp drat wt qsec vs am gear carb kpl
1 4 25.90 108.0 111.0 3.94 2.15 17.75 1.00 1.00 4.50 2.00 11.010
2 6 19.74 183.3 122.3 3.59 3.12 17.98 0.57 0.43 3.86 3.43 8.391
3 8 15.10 353.1 209.2 3.23 4.00 16.77 0.00 0.14 3.29 3.50 6.419
The code above is from an explanation for %>% in magrittr, where I'm trying to understand the pipe operator also (I know that it gives you the result of the previous computation, but I get lost in the aggregate
code line when it mixes .
, and %>%
inside the same function.
So, I can't understand what does the code above. I have the result (I put it above). But I don't get how it reach that result, specially the aggregate
code line, where it uses the dot and the ~
sign. I know that ~
means "all other variables", but what it means with the dot? It has another meaning or application? And what does the pipe operator inside a specific function?
That line uses the .
in three different ways.
[1] [2] [3]
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2))
Generally speaking you pass in the value from the pipe into your function at a specific location with .
but there are some exceptions. One exception is when the .
is in a formula. The ~
is used to create formulas in R. The pipe wont change the meaning of the formula, so it behaves like it would without any escaping. For example
aggregate(. ~ cyl, data=mydata)
And that's just because aggregate
requires a formula with both a left and right hand side. So the .
at [1]
just means "all the other columns in the dataset." This use is not at all related to magrittr.
The .
at [2]
is the value that's being passed in as the pipe. If you have a plain .
as a parameter to the function, that's there the value will be placed. So the result of the subset()
will go to the data=
parameter.
The magrittr
library also allows you to define anonymous functions with the .
variable. If you have a chain that starts with a .
, it's treated like a function. so
. %>% mean %>% round(2)
is the same as
function(x) round(mean(x), 2)
so you're just creating a custom function with the .
at [3]
Dot is used in three ways in the aggregate statement:
aggregate.formula the formula
method of aggregate
specifies a formula in which the left hand side (LHS) of the ~ defines the variables to apply the function to and the right hand side of the ~ defines the variables to group by. It uses dot in the formula to mean all other variables not already mentioned in the formula. For example, using the builtin ToothGrowth
data frame having columns len
, supp
and dose
these are the same. We group by supp
whereas mean
acts on each of len
and dose
.
aggregate(. ~ supp, ToothGrowth, mean)
aggregate(cbind(len, dose) ~ supp, ToothGrowth, mean)
RHS of pipe when used on the right hand side (RHS) of a pipe magrittr uses dot to represent the input, i.e. whatever is on the left hand side of the pipe. Thus, these are the same:
4 %>% sqrt(.) # use of dot on RHS
sqrt(4)
LHS of pipe when used on the left hand side of a pipe magrittr uses dot to represent a function definition. For example, these two function defintions both define a function that squares its argument:
square1 <- . %>% .^2 # use of dot on LHS
square2 <- function(x) x^2
Perhaps it is easiest to see if we write the example in the question without dot:
mtcars0 <- mtcars %>%
subset(hp > 100)
aggregate(
cbind(mpg,disp,hp,drat,wt,qsec,vs,am,gear,carb) ~ cyl, # cbind(...) in place of .
data = mtcars0, # mtcars0 in place of .
FUN = function(x) round(mean(x), 2)) # instead of . %>% etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With