So lets say I have a vector
a <- rnorm(6000)
I want to calculate the mean of the 1st value to the 60th, then again calculate the mean for the 61st value to the 120th and so fourth. So basically I want to calculate the mean for every 60th values giving me 100 means from that vector. I know I can do a for loop but I'd like to know if there is a better way to do this?
I would use
colMeans(matrix(a, 60))
.colMeans(a, 60, length(a) / 60) # more efficient (without reshaping to matrix)
Enhancement on user adunaic's request
This only works if there are 60x100 data points. If you have an incomplete 60 at the end then this errors. It would be good to have a general solution for others looking at this problem for ideas.
BinMean <- function (vec, every, na.rm = FALSE) {
n <- length(vec)
x <- .colMeans(vec, every, n %/% every, na.rm)
r <- n %% every
if (r) x <- c(x, mean.default(vec[(n - r + 1):n], na.rm = na.rm))
x
}
a <- 1:103
BinMean(a, every = 10)
# [1] 5.5 15.5 25.5 35.5 45.5 55.5 65.5 75.5 85.5 95.5 102.0
Alternative solution with group-by operation (less efficient)
BinMean2 <- function (vec, every, na.rm = FALSE) {
grp <- as.integer(ceiling(seq_along(vec) / every))
grp <- structure(grp, class = "factor",
levels = as.character(seq_len(grp[length(grp)])) )
lst <- .Internal(split(vec, grp))
unlist(lapply(lst, mean.default, na.rm = na.rm), use.names = FALSE)
}
Speed
library(microbenchmark)
a <- runif(1e+4)
microbenchmark(BinMean(a, 100), BinMean2(a, 100))
#Unit: microseconds
# expr min lq mean median uq max
# BinMean(a, 100) 40.400 42.1095 54.21286 48.3915 57.6555 205.702
# BinMean2(a, 100) 1216.823 1335.7920 1758.90267 1434.9090 1563.1535 21467.542
I recommend sapply
:
a <- rnorm(6000)
seq <- seq(1, length(a), 60)
a_mean <- sapply(seq, function(i) {mean(a[i:(i+59)])})
Another option is to use tapply
by creating a grouping variable.
Grouping variable could be created in two ways :
1) Using rep
tapply(a, rep(seq_along(a), each = n, length.out = length(a)), mean)
2) Using gl
tapply(a, gl(length(a)/n, n), mean)
If we convert the vector to dataframe/tibble we can use the same logic and calculate the mean
aggregate(a~gl(length(a)/n, n), data.frame(a), mean)
OR with dplyr
library(dplyr)
tibble::tibble(a) %>%
group_by(group = gl(length(a)/n, n)) %>%
summarise(mean_val = mean(a))
data
set.seed(1234)
a <- rnorm(6000)
n <- 60
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With