I expected the code below to output a data frame with three rows, each row representing the cumulative mean value of mpg after calculating the mean for each group of cyl
:
library(dplyr)
mtcars %>%
arrange(cyl) %>%
group_by(cyl) %>%
summarise(running.mean.mpg = cummean(mpg))
This is what I expected to happen:
mean_cyl_4 <- mtcars %>%
filter(cyl == 4) %>%
summarise(mean(mpg))
mean_cyl_4_6 <- mtcars %>%
filter(cyl == 4 | cyl == 6) %>%
summarise(mean(mpg))
mean_cyl_4_6_8 <- mtcars %>%
filter(cyl == 4 | cyl == 6 | cyl == 8) %>%
summarise(mean(mpg))
data.frame(cyl = c(4,6,8), running.mean.mpg = c(mean_cyl_4[1,1], mean_cyl_4_6[1,1], mean_cyl_4_6_8[1,1]))
cyl running.mean.mpg
1 4 26.66364
2 6 23.97222
3 8 20.09062
How come dplyr
seems to ignore group_by(cyl)
?
require("dplyr")
mtcars %>%
arrange(cyl) %>%
group_by(cyl) %>%
mutate(running.mean.mpg = cummean(mpg)) %>%
select(cyl, running.mean.mpg)
# Source: local data frame [32 x 2]
# Groups: cyl
#
# # cyl running.mean.mpg
# # 1 4 22.80000
# # 2 4 23.60000
# # 3 4 23.33333
# # 4 4 25.60000
# # 5 4 26.56000
# # 6 4 27.78333
# # 7 4 26.88571
# # 8 4 26.93750
For the sake of experimentation, this would also work with data.table
.
I mean, you have to load dplyr also to have cummean()
available.
require("data.table")
DT <- as.data.table(mtcars)
DT[,j=list(
running.mean.mpg = cummean(mpg)
), by="cyl"]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With