Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Arrange a grouped_df by group variable not working

I have a data.frame that contains client names, years, and several revenue numbers from each year.

df <- data.frame(client = rep(c("Client A","Client B", "Client C"),3),                   year = rep(c(2014,2013,2012), each=3),                   rev = rep(c(10,20,30),3)                 ) 

I want to end up with a data.frame that aggregates the revenue by client and year. I then want to sort the data.frame by year then by descending revenue.

library(dplyr) df1 <- df %>%          group_by(client, year) %>%         summarise(tot = sum(rev)) %>%         arrange(year, desc(tot)) 

However, when using the code above the arrange() function doesn't change the order of the grouped data.frame at all. When I run the below code and coerce to a normal data.frame it works.

   library(dplyr)     df1 <- df %>%              group_by(client, year) %>%             summarise(tot = sum(rev)) %>%             data.frame() %>%             arrange(year, desc(tot)) 

Am I missing something or will I need to do this every time when trying to arrange a grouped_df by a grouped variable?

R Version: 3.1.1 dplyr package version: 0.3.0.2

EDIT 11/13/2017: As noted by lucacerone, beginning with dplyr 0.5, arrange once again ignores groups when sorting. So my original code now works in the way I initially expected it would.

arrange() once again ignores grouping, reverting back to the behaviour of dplyr 0.3 and earlier. This makes arrange() inconsistent with other dplyr verbs, but I think this behaviour is generally more useful. Regardless, it’s not going to change again, as more changes will just cause more confusion.

like image 754
TBT8 Avatar asked Oct 24 '14 19:10

TBT8


People also ask

How do I arrange descending in R?

To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.

How do I arrange a group in R?

Arrange Rows In R For example, let's sort by teamID. Run arrange (teams, teamID). If you want them to be arranged in descending order, you need to use the desc ( ) function. As an example, if you want to sort by year in descending order, run arrange (teams, desc(yearID)).


2 Answers

Try switching the order of your group_by statement:

df %>%    group_by(year, client) %>%   summarise(tot = sum(rev)) %>%   arrange(year, desc(tot)) 

I think arrange is ordering within groups; after summarize, the last group is dropped, so this means in your first example it's arranging rows within the client group. Switching the order to group_by(year, client) seems to fix it because the client group gets dropped after summarize.

Alternatively, there is the ungroup() function

df %>%    group_by(client, year) %>%   summarise(tot = sum(rev)) %>%   ungroup() %>%   arrange(year, desc(tot)) 

Edit, @lucacerone: since dplyr 0.5 this does not work anymore:

Breaking changes arrange() once again ignores grouping, reverting back to the behaviour of dplyr 0.3 and earlier. This makes arrange() inconsistent with other dplyr verbs, but I think this behaviour is generally more useful. Regardless, it’s not going to change again, as more changes will just cause more confusion.

like image 53
Kara Woo Avatar answered Sep 18 '22 16:09

Kara Woo


Latest versions of dplyr (at least from dplyr_0.7.4) allow to arrange within groups. You just have so set into the arrange() call .by_group = TRUE. More information is available here In your example, try:

library(dplyr) df %>%          group_by(client, year) %>%         summarise(tot = sum(rev)) %>%         arrange(desc(tot), .by_group = TRUE) 
like image 42
nghauran Avatar answered Sep 21 '22 16:09

nghauran