Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pass grouped dataframe to own function in dplyr

Tags:

r

dplyr

plyr

I am trying to transfer from plyr to dplyr. However, I still can't seem to figure out how to call on own functions in a chained dplyr function.

I have a data frame with a factorised ID variable and an order variable. I want to split the frame by the ID, order it by the order variable and add a sequence in a new column.

My plyr functions looks like this:

f <- function(x) cbind(x[order(x$order_variable), ], Experience = 0:(nrow(x)-1))
data <- ddply(data, .(ID_variable), f)

In dplyr I though this should look something like this

f <- function(x) cbind(x[order(x$order_variable), ], Experience = 0:(nrow(x)-1))
data <- data %>% group_by(ID_variable) %>% f

Can anyone tell me how to modify my dplyr call to successfully pass my own function and get the same functionality my plyr function provides?

EDIT: If I use the dplyr formula as described here, it DOES pass an object to f. However, while plyr seems to pass a number of different tables (split by the ID variable), dplyr does not pass one table per group but the ENTIRE table (as some kind of dplyr object where groups are annotated), thus when I cbind the Experience variable it appends a counter from 0 to the length of the entire table instead of the single groups.

I have found a way to get the same functionality in dplyr using this approach:

data <- data %>%
    group_by(ID_variable) %>%
    arrange(ID_variable,order_variable) %>% 
    mutate(Experience = 0:(n()-1))

However, I would still be keen to learn how to pass grouped variables split into different tables to own functions in dplyr.

like image 510
Phil Avatar asked Jan 28 '15 19:01

Phil


1 Answers

For those who get here from google. Let's say you wrote your own print function.

printFunction <- function(dat) print(dat)
df <- data.frame(a = 1:6, b = 1:2)

As it was asked here

df %>% 
    group_by(b) %>% 
    printFunction(.)

prints entire data. To get dplyr print multiple tables grouped by, you should use do

df %>% 
    group_by(b) %>% 
    do(printFunction(.))
like image 55
Kipras Kančys Avatar answered Oct 24 '22 08:10

Kipras Kančys