when using dplyr function group_by()
and immediately afterwards arrange()
, I would expect to get an output where data frame is ordered within groups that I stated in group_by()
. My reading of documentation is that this combination should produce such a result, however when I tried it this is not what I get, and googling did not indicate that other people ran into the same issue. Am I wrong in expecting this result?
Here is an example, using the R built-in dataset ToothGrowth:
library(dplyr) ToothGrowth %>% group_by(supp) %>% arrange(len)
Running this will produce a data frame where the whole data frame is ordered according to len
and not within supp
factors.
This is the code that produces the desired output:
ToothGrowth %>% group_by(supp) %>% do( data.frame(with(data=., .[order(len),] )) )
arrange() orders the rows of a data frame by the values of selected columns.
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).
Group_by() function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by() function alone will not give any output. It should be followed by summarise() function with an appropriate action to perform. It works similar to GROUP BY in SQL and pivot table in excel.
Groupby preserves the order of rows within each group. Thus, it is clear the "Groupby" does preserve the order of rows within each group.
You can produce the expected behaviour by setting .by_group = TRUE
in arrange
:
library(dplyr) ToothGrowth %>% group_by(supp) %>% arrange(len, .by_group = TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With