Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr issues when using group_by(multiple variables)

I want to start using dplyr in place of ddply but I can't get a handle on how it works (I've read the documentation).

For example, why when I try to mutate() something does the "group_by" function not work as it's supposed to?

Looking at mtcars:

library(car)

Say I make a data.frame which is a summary of mtcars, grouped by "cyl" and "gear":

df1 <- mtcars %.%             group_by(cyl, gear) %.%             summarise(                 newvar = sum(wt)             ) 

Then say I want to further summarise this dataframe. With ddply, it'd be straightforward, but when I try to do with with dplyr, it's not actually "grouping by":

df2 <- df1 %.%             group_by(cyl) %.%             mutate(                 newvar2 = newvar + 5             ) 

Still yields an ungrouped output:

  cyl gear newvar newvar2 1   6    3  6.675  11.675 2   4    4 19.025  24.025 3   6    4 12.375  17.375 4   6    5  2.770   7.770 5   4    3  2.465   7.465 6   8    3 49.249  54.249 7   4    5  3.653   8.653 8   8    5  6.740  11.740 

Am I doing something wrong with the syntax?


Edit:

If I were to do this with plyr and ddply:

df1 <- ddply(mtcars, .(cyl, gear), summarise, newvar = sum(wt)) 

and then to get the second df:

df2 <- ddply(df1, .(cyl), summarise, newvar2 = sum(newvar) + 5) 

But that same approach, with sum(newvar) + 5 in the summarise() function doesn't work with dplyr...

like image 212
Marc Tulla Avatar asked Feb 08 '14 23:02

Marc Tulla


People also ask

Can you group by multiple columns in Dplyr?

The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.

What's the point of using Group_by ()?

Most data operations are done on groups defined by variables. group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group".

What does %>% do in Dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

What is Groupby in Dplyr?

Groupby Function in R – group_by is used to group the dataframe in R. Dplyr package in R is provided with group_by() function which groups the dataframe by multiple columns with mean, sum and other functions like count, maximum and minimum.


2 Answers

I had a similar problem. I found that simply detaching plyr solved it:

detach(package:plyr)     library(dplyr) 
like image 109
ManneR Avatar answered Sep 20 '22 05:09

ManneR


Taking Dickoa's answer one step further -- as Hadley says "summarise peels off a single layer of grouping". It peels off grouping from the reverse order in which you applied it so you can just use

mtcars %>%  group_by(cyl, gear) %>%  summarise(newvar = sum(wt)) %>%  summarise(newvar2 = sum(newvar) + 5) 

Note that this will give a different answer if you use group_by(gear, cyl) in the second line.

And to get your first attempt working:

df1 <- mtcars %>%  group_by(cyl, gear) %>%  summarise(newvar = sum(wt))  df2 <- df1 %>%  group_by(cyl) %>%  summarise(newvar2 = sum(newvar)+5) 
like image 31
Tim Cameron Avatar answered Sep 18 '22 05:09

Tim Cameron