Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R combinations with dot ("."), "~", and pipe (%>%) operator

I have been looking to a lot of answers and still I can't completely understand them. For example, the clearest one (here), among others (1,2,3) gives specific examples about the various uses of the dot but I cannot understand, for example, its application here:

car_data <- 
  mtcars %>%
  subset(hp > 100) %>%
  aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2)) %>%
  transform(kpl = mpg %>% multiply_by(0.4251)) %>%
  print

#result:
  cyl   mpg  disp    hp drat   wt  qsec   vs   am gear carb    kpl
1   4 25.90 108.0 111.0 3.94 2.15 17.75 1.00 1.00 4.50 2.00 11.010
2   6 19.74 183.3 122.3 3.59 3.12 17.98 0.57 0.43 3.86 3.43  8.391
3   8 15.10 353.1 209.2 3.23 4.00 16.77 0.00 0.14 3.29 3.50  6.419

The code above is from an explanation for %>% in magrittr, where I'm trying to understand the pipe operator also (I know that it gives you the result of the previous computation, but I get lost in the aggregate code line when it mixes ., and %>% inside the same function.

So, I can't understand what does the code above. I have the result (I put it above). But I don't get how it reach that result, specially the aggregate code line, where it uses the dot and the ~ sign. I know that ~ means "all other variables", but what it means with the dot? It has another meaning or application? And what does the pipe operator inside a specific function?

like image 370
Chris Avatar asked Feb 21 '19 20:02

Chris


2 Answers

That line uses the . in three different ways.

         [1]             [2]      [3]
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2))

Generally speaking you pass in the value from the pipe into your function at a specific location with . but there are some exceptions. One exception is when the . is in a formula. The ~ is used to create formulas in R. The pipe wont change the meaning of the formula, so it behaves like it would without any escaping. For example

aggregate(. ~ cyl, data=mydata)

And that's just because aggregate requires a formula with both a left and right hand side. So the . at [1] just means "all the other columns in the dataset." This use is not at all related to magrittr.

The . at [2] is the value that's being passed in as the pipe. If you have a plain . as a parameter to the function, that's there the value will be placed. So the result of the subset() will go to the data= parameter.

The magrittr library also allows you to define anonymous functions with the . variable. If you have a chain that starts with a ., it's treated like a function. so

. %>% mean %>% round(2)

is the same as

function(x) round(mean(x), 2)

so you're just creating a custom function with the . at [3]

like image 140
MrFlick Avatar answered Nov 13 '22 02:11

MrFlick


Dot is used in three ways in the aggregate statement:

  • aggregate.formula the formula method of aggregate specifies a formula in which the left hand side (LHS) of the ~ defines the variables to apply the function to and the right hand side of the ~ defines the variables to group by. It uses dot in the formula to mean all other variables not already mentioned in the formula. For example, using the builtin ToothGrowth data frame having columns len, supp and dose these are the same. We group by supp whereas mean acts on each of len and dose.

    aggregate(. ~ supp, ToothGrowth, mean)
    aggregate(cbind(len, dose) ~ supp, ToothGrowth, mean)
    
  • RHS of pipe when used on the right hand side (RHS) of a pipe magrittr uses dot to represent the input, i.e. whatever is on the left hand side of the pipe. Thus, these are the same:

    4 %>% sqrt(.) # use of dot on RHS
    sqrt(4)
    
  • LHS of pipe when used on the left hand side of a pipe magrittr uses dot to represent a function definition. For example, these two function defintions both define a function that squares its argument:

    square1 <- . %>% .^2 # use of dot on LHS
    square2 <- function(x) x^2
    

Perhaps it is easiest to see if we write the example in the question without dot:

mtcars0 <-  mtcars %>%
  subset(hp > 100)

aggregate(
  cbind(mpg,disp,hp,drat,wt,qsec,vs,am,gear,carb) ~ cyl,  # cbind(...) in place of .
  data = mtcars0, # mtcars0 in place of .
  FUN = function(x) round(mean(x), 2)) # instead of . %>% etc.
like image 4
G. Grothendieck Avatar answered Nov 13 '22 01:11

G. Grothendieck