I want to use the ave
function on many columns (tens) on the data frame:
ave(df[,the_cols], df[,c('site', 'month')], FUN = mean)
The problem is that ave
runs the mean
function on all the_cols
columns together. Is there any way to run it for each of the_cols
columns separately?
I tried to look at the other functions. tapply
and aggregate
are different, they return only one row per group. I need the ave
behaviour, i.e. to return the same number of rows as given in the original df
. There is also a by
function, but using it would be very clumsy as it returns a complicated list structure that would have to be converted somehow.
Certainly many clumsy and ugly (by & do.call, multiple *apply function calls etc.) solutions exist but is there some really easy and elegant?
The average value in the first row across the first and third columns is 1. The average value in the second row across the first and third columns is 7. And so on. You can find more R tutorials here.
In contrast, the ave function returns the output once per input value. There’s more to learn about the ave function! So keep on reading… Example 2 shows how to use the ave function to calculate averages by group.
Getting the average of values in the same row but different columns is a tool for understanding the relationship if one exists. Explanation of the function. The rowMeans ()average function finds the average numeric vector of a dataframe or other multi-column data set, like an array or a matrix.
The only minimally tricky aspect is that some columns contain NAs. I want to create a Col4 that averages the entries in the first 3 columns, ignoring the NAs. So: but it doesn't. Show activity on this post. Show activity on this post. Why NOT the accepted answer?
Perhaps I'm missing something, but an apply()
approach here would work very well and wouldn't be ugly or require any ugly hacks. Some dummy data:
df <- data.frame(A = rnorm(20), B = rnorm(20), site = gl(5,4), month = gl(10, 2))
what is wrong with:
sapply(df[, c("A","B")], ave, df$site, df$month)
? Coerce that to a data frame via data.frame()
if you really want that.
R> sapply(df[, c("A","B")], ave, df$site, df$month)
A B
[1,] 0.0775 0.04845
[2,] 0.0775 0.04845
[3,] -1.5563 0.43443
[4,] -1.5563 0.43443
[5,] 0.7193 0.01151
[6,] 0.7193 0.01151
[7,] -0.9243 -0.28483
[8,] -0.9243 -0.28483
[9,] 0.3316 0.14473
[10,] 0.3316 0.14473
[11,] -0.2539 0.20384
[12,] -0.2539 0.20384
[13,] 0.5558 -0.37239
[14,] 0.5558 -0.37239
[15,] 0.1976 -0.22693
[16,] 0.1976 -0.22693
[17,] 0.2031 1.11041
[18,] 0.2031 1.11041
[19,] 0.3229 -0.53818
[20,] 0.3229 -0.53818
Putting it together a bit more, how about
AVE <- function(df, cols, ...) {
dots <- list(...)
out <- sapply(df[, cols], ave, ...)
out <- data.frame(as.data.frame(dots), out)
names(out) <- c(paste0("Fac", seq_along(dots)), cols)
out
}
R> AVE(df, c("A","B"), df$site, df$month)
Fac1 Fac2 A B
1 1 1 0.0775 0.04845
2 1 1 0.0775 0.04845
3 1 2 -1.5563 0.43443
4 1 2 -1.5563 0.43443
5 2 3 0.7193 0.01151
6 2 3 0.7193 0.01151
7 2 4 -0.9243 -0.28483
8 2 4 -0.9243 -0.28483
9 3 5 0.3316 0.14473
10 3 5 0.3316 0.14473
11 3 6 -0.2539 0.20384
12 3 6 -0.2539 0.20384
13 4 7 0.5558 -0.37239
14 4 7 0.5558 -0.37239
15 4 8 0.1976 -0.22693
16 4 8 0.1976 -0.22693
17 5 9 0.2031 1.11041
18 5 9 0.2031 1.11041
19 5 10 0.3229 -0.53818
20 5 10 0.3229 -0.53818
The details of working with ...
escape me at the moment, but you should be able to get better names for the Fac1
etc that I used here.
I'll throw an alternative representation out there for you: aggregate()
but use the ave()
function instead of mean()
:
R> aggregate(cbind(A, B) ~ site + month, data = df, ave)
site month A.1 A.2 B.1 B.2
1 1 1 0.0775 0.0775 0.04845 0.04845
2 1 2 -1.5563 -1.5563 0.43443 0.43443
3 2 3 0.7193 0.7193 0.01151 0.01151
4 2 4 -0.9243 -0.9243 -0.28483 -0.28483
5 3 5 0.3316 0.3316 0.14473 0.14473
6 3 6 -0.2539 -0.2539 0.20384 0.20384
7 4 7 0.5558 0.5558 -0.37239 -0.37239
8 4 8 0.1976 0.1976 -0.22693 -0.22693
9 5 9 0.2031 0.2031 1.11041 1.11041
10 5 10 0.3229 0.3229 -0.53818 -0.53818
Note quite the stated output, but it is something that is simple to reshape if needed.
If you want to have a data.frame back
library(plyr)
## assuming that the_cols are string
## if col index just add the index of site and month
the_cols <- c("site", "month", the_cols)
ddply(df, c('site', 'month'), FUN = numcolwise(mean))[,the_cols]
You can use by
with colMeans
by(df[,the_cols], df[,c('site', 'month')], FUN = colMeans)
You can also use ave
inside lapply
:
res <- lapply(df[,the_cols], function(x)
ave(x, df[,c('site', 'month')], FUN = mean))
data.frame(res) # create data frame
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With