I am using the library(nycflights13) and I use the following command to group_by month and day, select the top 3 rows within each group and then sort in descending order within each group by departure delay. The code is the following:
flights %>% group_by(month, day) %>% top_n(3, dep_delay) %>% arrange(desc(dep_delay))
Which returns the following output:
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr> <chr>
1 2013 1 9 641 900 1301 1242 1530 1272 HA 51 N384HA JFK HNL
2 2013 6 15 1432 1935 1137 1607 2120 1127 MQ 3535 N504MQ JFK CMH
3 2013 1 10 1121 1635 1126 1239 1810 1109 MQ 3695 N517MQ EWR ORD
4 2013 9 20 1139 1845 1014 1457 2210 1007 AA 177 N338AA JFK SFO
5 2013 7 22 845 1600 1005 1044 1815 989 MQ 3075 N665MQ JFK CVG
6 2013 4 10 1100 1900 960 1342 2211 931 DL 2391 N959DL JFK TPA
The records are sorted in descending order but not within groups.
Why is that? What should be done to correct the code? Your advice will be appreciated.
#Edit
#Following the suggestions made in the comments I still don't get what I am looking for, i.e. within each month-day grouping sorting of the top 3 records in descending order in terms of the departure delay:
flights %>% group_by(month, day) %>% top_n(3, dep_delay) %>% arrange(desc(month, day, dep_delay))
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr> <chr>
1 2013 12 1 657 1930 687 1010 2249 681 DL 1091 N342NW JFK SAT
2 2013 12 1 1504 1056 248 1628 1230 238 EV 5309 N615QX LGA BGR
3 2013 12 1 2017 1455 322 2146 1609 337 DL 1164 N6704Z JFK BOS
4 2013 12 2 1139 745 234 1358 1012 226 DL 807 N935AT EWR ATL
5 2013 12 2 1823 1345 278 2123 1640 283 UA 1510 N75861 EWR IAH
6 2013 12 2 1842 1428 254 1955 1545 250 EV 5712 N827AS JFK IAD
7 2013 12 3 920 600 200 1158 846 192 B6 583 N535JB JFK MCO
8 2013 12 3 1424 1114 190 1713 1347 206 UA 405 N437UA LGA DEN
9 2013 12 3 2300 1935 205 116 2203 193 FL 1346 N964AT LGA ATL
10 2013 12 4 1210 829 221 1440 1055 225 EV 4419 N23139 EWR XNA
arrange() orders the rows of a data frame by the values of selected columns.
The dplyr function arrange() can be used to reorder (or sort) rows by one or more variables. Instead of using the function desc(), you can prepend the sorting variable by a minus sign to indicate descending order, as follow. If the data contain missing values, they will always come at the end.
To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.
In this article, we will discuss about how to sort a dataframe in R programming language using Dplyr package. The package Dplyr in R programming language provides a function called arrange() function which is useful for sorting the dataframe.
You need to add .by_group=T
to arrange within groups.
flights %>%
group_by(month, day) %>%
top_n(3, dep_delay) %>%
arrange(dep_delay, .by_group = TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With