Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any explicit guarantee that dplyr operations preserve row order?

Tags:

r

dplyr

tidyverse

When I teach people how to use dplyr, I warn them not to assume that any dplyr functions will preserve the order of their dataframes/tibbles unless otherwise stated by documentation. However, I have not been able to find any official documentation on the matter, which makes it more difficult to convince people that they should be more careful about assuming what their code is doing. For example, the mutate() explicitly guarantees that the number of rows will be preserved, but doesn't say anything about order preservation. Is there any official statement or documentation associated with dplyr (or tidyverse) about what, if any, assumptions can be made with regards to row order preservation in functions that I can point people to?

like image 590
anjama Avatar asked Feb 11 '20 16:02

anjama


1 Answers

This is from the Roxygen comments in the mutate source code:

For mutate():

  • Rows are not affected.

  • Existing columns will be preserved unless explicitly modified.

  • New columns will be added to the right of existing columns.

  • Columns given value NULL will be removed Groups will be recomputed if a grouping variable is mutated.

  • Data frame attributes are preserved.

For transmute():

  • Rows are not affected.

  • Apart from grouping variables, existing columns will be remove unless explicitly kept.

  • Column order matches order of expressions.

  • Groups will be recomputed if a grouping variable is mutated.

  • Data frame attributes are preserved.

Which I would interpret as saying that row order is preserved. Since it comes from the source code, I would take it as canonical.

like image 97
Allan Cameron Avatar answered Oct 14 '22 09:10

Allan Cameron