Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use the %.% operator in R (EDIT: operator deprecated in 2014)

EDIT: %.% operator is now deprecated. Use %>% from magrittr.

ORIGINAL QUESTION What does this %.% operator do?? I've seen it used a lot with the dplyr package, but can't seem to find any supporting documentation on what it is or how it works.

It seems to chain commands together, but that's as far as I can tell...While I'm at it, can anyone explain what the gambit of those special operators that hang around with the % sign do and when is technically the right time to use them to code better?

like image 796
testname123 Avatar asked Mar 11 '14 01:03

testname123


People also ask

What is%% in dplyr?

%% is the modulo operator. example: 5 %% 3 will be equal to 2 as it is the remainder obtained on the integral division of 5 by 3.

What is the percent operator in R?

The %in% operator in R can be used to identify if an element (e.g., a number) belongs to a vector or dataframe. For example, it can be used the see if the number 1 is in the sequence of numbers 1 to 10.

What does percent in R mean?

A % by itself doesn't mean anything special, but R allows you to define your own infix operators in the form %<something>% using two percent signs.


2 Answers

I think Hadley would be the best person to explain to you, but I will give it a shot.

%.% is a binary operator called chain operator. In Ryou can pretty much define any binary operator of your own with the special character %. From what I have seem, we pretty much use it to make easier "chainable" syntaxes (like x+y, much better than sum(x,y)). You can do really cool stuff with them, see this cool example here.

What is the purpose of %.% in dplyr? To make it easier for you to express yourself, reducing the gap between what you want to do and how you express it.

Taking the example from the introduction to dplyr, let's suppose you want to group flights by year, month and day, select those variables plus the delays in arrival and departure, summarise these by the mean and then filter just those delays over 30. If there were no %.%, you would have to write like this:

filter(
  summarise(
    select(
      group_by(hflights, Year, Month, DayofMonth),
      Year:DayofMonth, ArrDelay, DepDelay
    ),
    arr = mean(ArrDelay, na.rm = TRUE),
    dep = mean(DepDelay, na.rm = TRUE)
  ),
  arr > 30 | dep > 30
)

It does the job. But it is pretty difficult to express yourself and to read it. Now, you can write the same thing with a more friendly syntax using the chain operator %.%:

hflights %.%
  group_by(Year, Month, DayofMonth) %.%
  select(Year:DayofMonth, ArrDelay, DepDelay) %.%
  summarise(
    arr = mean(ArrDelay, na.rm = TRUE),
    dep = mean(DepDelay, na.rm = TRUE)
  ) %.%
  filter(arr > 30 | dep > 30)

It is easier both to write and read!

And how does that work?

Let's take a look at the definitions. First for %.%:

function (x, y) 
{
    chain_q(list(substitute(x), substitute(y)), env = parent.frame())
}

It uses another function called chain_q. So let's look at it:

function (calls, env = parent.frame()) 
{
    if (length(calls) == 0) 
        return()
    if (length(calls) == 1) 
        return(eval(calls[[1]], env))
    e <- new.env(parent = env)
    e$`__prev` <- eval(calls[[1]], env)
    for (call in calls[-1]) {
        new_call <- as.call(c(call[[1]], quote(`__prev`), as.list(call[-1])))
        e$`__prev` <- eval(new_call, e)
    }
    e$`__prev`
}

What does that do?

To simplify things, let's assume you called: group_by(hflights,Year, Month, DayofMonth) %.% select(Year:DayofMonth, ArrDelay, DepDelay).

Your calls x and y are then both group_by(hflights,Year, Month, DayofMonth) and select(Year:DayofMonth, ArrDelay, DepDelay). So the function creates a new environment called e (e <- new.env(parent = env)) and saves an object called __prev with the evaluation of the first call (e$'__prev' <- eval(calls[[1]], env). Then for each other call it creates another call whose first argument is the previous call - that is __prev - in our case it would be select('__prev', Year:DayofMonth, ArrDelay, DepDelay) - so it "chains" the calls inside the loop.

Since you can use binary operators one over another, you actually can use this syntax to express very complex manipulations in a very readable way.

like image 184
Carlos Cinelli Avatar answered Sep 29 '22 16:09

Carlos Cinelli


A quick search landed me here:

dplyr provides another innovation over plyr: the ability to chain operations together from left to right with the %.% operator. This makes dplyr behave a little like a grammar of data manipulation.

Example:

Batting %.%
  group_by(playerID) %.%
  summarise(total = sum(G)) %.%
  arrange(desc(total)) %.%
  head(5)`

Read more about it from the help section, ?"%.%".

like image 26
rkmorgan Avatar answered Sep 29 '22 18:09

rkmorgan