Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr: How to apply do() on result of group_by?

Tags:

I'd like to use dplyr to group a table by one column, then apply a function to the set of values in the second column of each group.

For instance, in the code example below, I'd like to return all of the 2-item combinations of foods eaten by each person. I cannot figure out how to properly supply the function with the proper column (foods) in the do() function.

library(dplyr)

person = c( 'Grace', 'Grace', 'Grace', 'Rob', 'Rob', 'Rob' )
foods   = c( 'apple', 'banana', 'cucumber', 'spaghetti', 'cucumber', 'banana' )
eaten  = data.frame(person, foods)

by_person = group_by(eaten, person)

# How to do this?
do( by_person, combn( x = foods, m = 2 ) )

Note that the example code in ?do fails on my machine

mods <- do(carriers, failwith(NULL, lm), formula = ArrDelay ~ date)
like image 492
zimmeee Avatar asked Mar 04 '14 20:03

zimmeee


People also ask

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

What function is group_by in R?

Group_by() function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by() function alone will not give any output. It should be followed by summarise() function with an appropriate action to perform. It works similar to GROUP BY in SQL and pivot table in excel.

Can you group by multiple columns in dplyr?

The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.

How do I select specific data in R?

To select a specific column, you can also type in the name of the dataframe, followed by a $ , and then the name of the column you are looking to select. In this example, we will be selecting the payment column of the dataframe. When running this script, R will simplify the result as a vector.


1 Answers

Let us define eaten like this:

eaten <- data.frame(person, foods, stringsAsFactors = FALSE)

1) Then try this:

eaten %.% group_by(person) %.% do(function(x) combn(x$foods, m = 2))

giving:

[[1]]
     [,1]     [,2]       [,3]      
[1,] "apple"  "apple"    "banana"  
[2,] "banana" "cucumber" "cucumber"

[[2]]
     [,1]        [,2]        [,3]      
[1,] "spaghetti" "spaghetti" "cucumber"
[2,] "cucumber"  "banana"    "banana"  

2) To be able to do something near to what @Hadley describes in the comments without waiting for a future version of dplyr try this where do2 is found here:

library(gsubfn)
eaten %.% group_by(person) %.% fn$do2(~ combn(.$foods, m = 2))

giving:

$Grace
     [,1]     [,2]       [,3]      
[1,] "apple"  "apple"    "banana"  
[2,] "banana" "cucumber" "cucumber"

$Rob
     [,1]        [,2]        [,3]      
[1,] "spaghetti" "spaghetti" "cucumber"
[2,] "cucumber"  "banana"    "banana"  

Note: The last line of the question giving the code in the help file also fails for me. This variation of it works for me: do(jan, lm, formula = ArrDelay ~ date) .

like image 93
G. Grothendieck Avatar answered Oct 22 '22 13:10

G. Grothendieck