I must be missing something with how group_by
levels in dplyr
get peeled off. In the example below, I group by 2 columns, summarize values into a single variable, then sort by that new variable:
mtcars %>% group_by( cyl, gear ) %>%
summarize( hp_range = max(hp) - min(mpg)) %>%
arrange( desc(hp_range) )
# Source: local data frame [8 x 3]
# Groups: cyl [3]
#
# cyl gear hp_range
# (dbl) (dbl) (dbl)
#1 4 4 87.6
#2 4 5 87.0
#3 4 3 75.5
#4 6 5 155.3
#5 6 4 105.2
#6 6 3 91.9
#7 8 5 320.0
#8 8 3 234.6
Obviously this is not sorted by hp_range
as intended. What am I missing?
EDIT: The example works as expected without the call to desc
in arrange. Still unclear why?
Ok, just got to the bottom of this:
desc
had no effect, it was by chance that the example did not work without itThe key is that when you group_by
multiple columns, it seems that results are automatically sorted by the Groups. In the example above it is sorted by cyl
. To get the intended sort of the entire data table, you must first ungroup
and then arrange
mtcars %>% group_by( cyl, gear ) %>%
summarize( hp_range = max(hp) - min(mpg)) %>%
ungroup() %>%
arrange( hp_range )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With