If I calculate something using aggregate function or using summarise in dplyr package why those gives answers different order?
Example:
a <- aggregate(hp~mpg+cyl+gear, mtcars, FUN = sum)
gives me
mpg cyl gear hp
1 21.5 4 3 97
2 18.1 6 3 105
3 21.4 6 3 110
4 10.4 8 3 420
5 13.3 8 3 245
and
b <- mtcars %>%
group_by(mpg, cyl, gear) %>%
summarise(hp = sum(hp))
gives me
mpg cyl gear hp
<dbl> <dbl> <dbl> <dbl>
1 10.4 8 3 420
2 13.3 8 3 245
3 14.3 8 3 245
4 14.7 8 3 230
5 15 8 5 335
Why order is not the same?
As mentioned by @zx8754, tidyverse operations will re-order the rows. No guarantee that you will get a certain row order. https://github.com/tidyverse/dplyr/issues/2192#issuecomment-281655703
Looking a bit closely, I see that aggregate sorted by gear, cyl, then mpg.
So the following tidyverse code will provide the same row order as aggregate(hp~mpg+cyl+gear, mtcars, FUN = sum)
:
library(tidyverse)
mtcars %>% group_by(gear, cyl, mpg) %>% summarise(hp = sum(hp)) %>% head()
#> # A tibble: 6 x 4
#> # Groups: gear, cyl [3]
#> gear cyl mpg hp
#> <dbl> <dbl> <dbl> <dbl>
#> 1 3 4 21.5 97
#> 2 3 6 18.1 105
#> 3 3 6 21.4 110
#> 4 3 8 10.4 420
#> 5 3 8 13.3 245
#> 6 3 8 14.3 245
Created on 2019-02-27 by the reprex package (v0.2.1)
and to get the same row order as mtcars %>% group_by(mpg, cyl, gear) %>% summarise(hp = sum(hp))
:
library(tidyverse)
aggregate(hp~gear+cyl+mpg, mtcars, FUN = sum) %>% head()
#> gear cyl mpg hp
#> 1 3 8 10.4 420
#> 2 3 8 13.3 245
#> 3 3 8 14.3 245
#> 4 3 8 14.7 230
#> 5 5 8 15.0 335
#> 6 3 8 15.2 330
Created on 2019-02-27 by the reprex package (v0.2.1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With