When I teach people how to use dplyr, I warn them not to assume that any dplyr functions will preserve the order of their dataframes/tibbles unless otherwise stated by documentation. However, I have not been able to find any official documentation on the matter, which makes it more difficult to convince people that they should be more careful about assuming what their code is doing. For example, the mutate() explicitly guarantees that the number of rows will be preserved, but doesn't say anything about order preservation. Is there any official statement or documentation associated with dplyr (or tidyverse) about what, if any, assumptions can be made with regards to row order preservation in functions that I can point people to?
This is from the Roxygen comments in the mutate
source code:
For
mutate()
:
Rows are not affected.
Existing columns will be preserved unless explicitly modified.
New columns will be added to the right of existing columns.
Columns given value
NULL
will be removed Groups will be recomputed if a grouping variable is mutated.Data frame attributes are preserved.
For
transmute()
:
Rows are not affected.
Apart from grouping variables, existing columns will be remove unless explicitly kept.
Column order matches order of expressions.
Groups will be recomputed if a grouping variable is mutated.
Data frame attributes are preserved.
Which I would interpret as saying that row order is preserved. Since it comes from the source code, I would take it as canonical.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With