Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Arrange within a group with dplyr

I am using the library(nycflights13) and I use the following command to group_by month and day, select the top 3 rows within each group and then sort in descending order within each group by departure delay. The code is the following:

flights %>% group_by(month, day)  %>% top_n(3, dep_delay) %>% arrange(desc(dep_delay))

Which returns the following output:

    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin  dest
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>   <chr>  <int>   <chr>  <chr> <chr>
1   2013     1     9      641            900      1301     1242           1530      1272      HA     51  N384HA    JFK   HNL
2   2013     6    15     1432           1935      1137     1607           2120      1127      MQ   3535  N504MQ    JFK   CMH
3   2013     1    10     1121           1635      1126     1239           1810      1109      MQ   3695  N517MQ    EWR   ORD
4   2013     9    20     1139           1845      1014     1457           2210      1007      AA    177  N338AA    JFK   SFO
5   2013     7    22      845           1600      1005     1044           1815       989      MQ   3075  N665MQ    JFK   CVG
6   2013     4    10     1100           1900       960     1342           2211       931      DL   2391  N959DL    JFK   TPA

The records are sorted in descending order but not within groups.

Why is that? What should be done to correct the code? Your advice will be appreciated.

#

Edit

#

Following the suggestions made in the comments I still don't get what I am looking for, i.e. within each month-day grouping sorting of the top 3 records in descending order in terms of the departure delay:

flights %>% group_by(month, day)  %>% top_n(3, dep_delay) %>% arrange(desc(month, day,  dep_delay))

   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin  dest
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>   <chr>  <int>   <chr>  <chr> <chr>
1   2013    12     1      657           1930       687     1010           2249       681      DL   1091  N342NW    JFK   SAT
2   2013    12     1     1504           1056       248     1628           1230       238      EV   5309  N615QX    LGA   BGR
3   2013    12     1     2017           1455       322     2146           1609       337      DL   1164  N6704Z    JFK   BOS
4   2013    12     2     1139            745       234     1358           1012       226      DL    807  N935AT    EWR   ATL
5   2013    12     2     1823           1345       278     2123           1640       283      UA   1510  N75861    EWR   IAH
6   2013    12     2     1842           1428       254     1955           1545       250      EV   5712  N827AS    JFK   IAD
7   2013    12     3      920            600       200     1158            846       192      B6    583  N535JB    JFK   MCO
8   2013    12     3     1424           1114       190     1713           1347       206      UA    405  N437UA    LGA   DEN
9   2013    12     3     2300           1935       205      116           2203       193      FL   1346  N964AT    LGA   ATL
10  2013    12     4     1210            829       221     1440           1055       225      EV   4419  N23139    EWR   XNA
like image 648
rf7 Avatar asked May 07 '17 14:05

rf7


People also ask

What is the use of Arrange () with dplyr package?

arrange() orders the rows of a data frame by the values of selected columns.

How do you rearrange the order of a column in a data set using dplyr functions?

The dplyr function arrange() can be used to reorder (or sort) rows by one or more variables. Instead of using the function desc(), you can prepend the sorting variable by a minus sign to indicate descending order, as follow. If the data contain missing values, they will always come at the end.

How do I arrange values in R?

To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.

Which method is used to sort data in dplyr package?

In this article, we will discuss about how to sort a dataframe in R programming language using Dplyr package. The package Dplyr in R programming language provides a function called arrange() function which is useful for sorting the dataframe.


1 Answers

You need to add .by_group=T to arrange within groups.

flights %>%
   group_by(month, day) %>%
   top_n(3, dep_delay) %>%
   arrange(dep_delay, .by_group = TRUE)

like image 125
Jeff Bezos Avatar answered Sep 28 '22 07:09

Jeff Bezos