I must be missing something with how group_by levels in dplyr get peeled off. In the example below, I group by 2 columns, summarize values into a single variable, then sort by that new variable:
mtcars %>% group_by( cyl, gear ) %>%
summarize( hp_range = max(hp) - min(mpg)) %>%
arrange( desc(hp_range) )
# Source: local data frame [8 x 3]
# Groups: cyl [3]
#
# cyl gear hp_range
# (dbl) (dbl) (dbl)
#1 4 4 87.6
#2 4 5 87.0
#3 4 3 75.5
#4 6 5 155.3
#5 6 4 105.2
#6 6 3 91.9
#7 8 5 320.0
#8 8 3 234.6
Obviously this is not sorted by hp_range as intended. What am I missing?
EDIT: The example works as expected without the call to desc in arrange. Still unclear why?
Ok, just got to the bottom of this:
desc had no effect, it was by chance that the example did not work without itThe key is that when you group_by multiple columns, it seems that results are automatically sorted by the Groups. In the example above it is sorted by cyl. To get the intended sort of the entire data table, you must first ungroup and then arrange
mtcars %>% group_by( cyl, gear ) %>%
summarize( hp_range = max(hp) - min(mpg)) %>%
ungroup() %>%
arrange( hp_range )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With