I have a data.frame that contains client names, years, and several revenue numbers from each year.
df <- data.frame(client = rep(c("Client A","Client B", "Client C"),3), year = rep(c(2014,2013,2012), each=3), rev = rep(c(10,20,30),3) )
I want to end up with a data.frame that aggregates the revenue by client and year. I then want to sort the data.frame by year then by descending revenue.
library(dplyr) df1 <- df %>% group_by(client, year) %>% summarise(tot = sum(rev)) %>% arrange(year, desc(tot))
However, when using the code above the arrange()
function doesn't change the order of the grouped data.frame at all. When I run the below code and coerce to a normal data.frame it works.
library(dplyr) df1 <- df %>% group_by(client, year) %>% summarise(tot = sum(rev)) %>% data.frame() %>% arrange(year, desc(tot))
Am I missing something or will I need to do this every time when trying to arrange
a grouped_df by a grouped variable?
R Version: 3.1.1 dplyr package version: 0.3.0.2
EDIT 11/13/2017: As noted by lucacerone, beginning with dplyr 0.5, arrange once again ignores groups when sorting. So my original code now works in the way I initially expected it would.
arrange() once again ignores grouping, reverting back to the behaviour of dplyr 0.3 and earlier. This makes arrange() inconsistent with other dplyr verbs, but I think this behaviour is generally more useful. Regardless, it’s not going to change again, as more changes will just cause more confusion.
To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.
Arrange Rows In R For example, let's sort by teamID. Run arrange (teams, teamID). If you want them to be arranged in descending order, you need to use the desc ( ) function. As an example, if you want to sort by year in descending order, run arrange (teams, desc(yearID)).
Try switching the order of your group_by
statement:
df %>% group_by(year, client) %>% summarise(tot = sum(rev)) %>% arrange(year, desc(tot))
I think arrange
is ordering within groups; after summarize
, the last group is dropped, so this means in your first example it's arranging rows within the client
group. Switching the order to group_by(year, client)
seems to fix it because the client
group gets dropped after summarize
.
Alternatively, there is the ungroup()
function
df %>% group_by(client, year) %>% summarise(tot = sum(rev)) %>% ungroup() %>% arrange(year, desc(tot))
Edit, @lucacerone: since dplyr 0.5 this does not work anymore:
Breaking changes arrange() once again ignores grouping, reverting back to the behaviour of dplyr 0.3 and earlier. This makes arrange() inconsistent with other dplyr verbs, but I think this behaviour is generally more useful. Regardless, it’s not going to change again, as more changes will just cause more confusion.
Latest versions of dplyr
(at least from dplyr_0.7.4
) allow to arrange
within groups. You just have so set into the arrange()
call .by_group = TRUE
. More information is available here In your example, try:
library(dplyr) df %>% group_by(client, year) %>% summarise(tot = sum(rev)) %>% arrange(desc(tot), .by_group = TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With