dplyr: Arrange not behaving as expected after group_by and summarize

Question

I must be missing something with how group_by levels in dplyr get peeled off. In the example below, I group by 2 columns, summarize values into a single variable, then sort by that new variable:

mtcars %>% group_by( cyl, gear ) %>% 
  summarize( hp_range = max(hp) - min(mpg)) %>% 
  arrange( desc(hp_range) )

# Source: local data frame [8 x 3]
# Groups: cyl [3]
#
#    cyl  gear  hp_range
#  (dbl) (dbl) (dbl)
#1     4     4  87.6
#2     4     5  87.0
#3     4     3  75.5
#4     6     5 155.3
#5     6     4 105.2
#6     6     3  91.9
#7     8     5 320.0
#8     8     3 234.6

Obviously this is not sorted by hp_range as intended. What am I missing?

EDIT: The example works as expected without the call to desc in arrange. Still unclear why?

zimmeee · Accepted Answer

Ok, just got to the bottom of this:

The call to desc had no effect, it was by chance that the example did not work without it
The key is that when you group_by multiple columns, it seems that results are automatically sorted by the Groups. In the example above it is sorted by cyl. To get the intended sort of the entire data table, you must first ungroup and then arrange
```
mtcars %>% group_by( cyl, gear ) %>% 
   summarize( hp_range = max(hp) - min(mpg)) %>% 
   ungroup() %>% 
   arrange( hp_range )
```

dplyr: Arrange not behaving as expected after group_by and summarize

Tags:

r

dplyr

zimmeee

1 Answers

zimmeee

Recent Activity

Donate For Us

dplyr: Arrange not behaving as expected after group_by and summarize

Tags:

r

dplyr

zimmeee

1 Answers

zimmeee

Related questions

Recent Activity

Donate For Us