Possible Duplicate:
apply a function over groups of columns
I have a data.frame
with 30 rows and many columns (1000+), but I need to average every 16 columns together. For example, the data frame will look like this (I truncate it to make it easier..):
Col1 Col2 Col3 Col4........
4.176 4.505 4.048 4.489
6.167 6.184 6.359 6.444
5.829 5.739 5.961 5.764
.
.
.
Therefore, I cannot aggregate (I do not have a list) and I tried:
a <- data.frame(rowMeans(my.df[,1:length(my.df)]) )
which gives me the average of the all 1000+ coumns, But is there any way to say I want to do that every 16 columns until the end? (they are multiple of 16 the total number of columns).
A secondary, less important point but would be useful to solve this as well. The col names are in the following structure:
XXYY4ZZZ.txt
Once averaged the columns all I need is a new col name with only XXYY
as the rest will be averaged out. I know I could use gsub but is there an optimal way to do the averaging and this operation in one go?
I am still relatively new to R and therefore I am not sure where and how to find the answer.
Here is an example adapted from @ben's question and @TylerRinker's answer from apply a function over groups of columns . It should be able to apply any function over a matrix or data frame by intervals of columns.
# Create sample data for reproducible example
n <- 1000
set.seed(1234)
x <- matrix(runif(30 * n), ncol = n)
# Function to apply 'fun' to object 'x' over every 'by' columns
# Alternatively, 'by' may be a vector of groups
byapply <- function(x, by, fun, ...)
{
# Create index list
if (length(by) == 1)
{
nc <- ncol(x)
split.index <- rep(1:ceiling(nc / by), each = by, length.out = nc)
} else # 'by' is a vector of groups
{
nc <- length(by)
split.index <- by
}
index.list <- split(seq(from = 1, to = nc), split.index)
# Pass index list to fun using sapply() and return object
sapply(index.list, function(i)
{
do.call(fun, list(x[, i], ...))
})
}
# Run function
y <- byapply(x, 16, rowMeans)
# Test to make sure it returns expected result
y.test <- rowMeans(x[, 17:32])
all.equal(y[, 2], y.test)
# TRUE
You can do other odd things with it. For example, if you needed to know the total sum of every 10 columns, being sure to remove NA
s if present:
y.sums <- byapply(x, 10, sum, na.rm = T)
y.sums[1]
# 146.7756
sum(x[, 1:10], na.rm = T)
# 146.7756
Or find the standard deviations:
byapply(x, 10, apply, 1, sd)
Update
by
can also be specified as a vector of groups:
byapply(x, rep(1:10, each = 10), rowMeans)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With