I want to use the <code>ave</code> function on many columns (tens) on the data frame: <pre class="prettyprint"><code>ave(df[,the_cols], df[,c('site', 'month')], FUN = mean) </code></pre> The problem is that <code>ave</code> runs the <code>mean</code> function on all <code>the_cols</code> columns together. Is there any way to run it for each of <code>the_cols</code> columns separately? I tried to look at the other functions. <code>tapply</code> and <code>aggregate</code> are different, they return only one row per group. I need the <code>ave</code> behaviour, i.e. to return the same number of rows as given in the original <code>df</code>. There is also a <code>by</code> function, but using it would be very clumsy as it returns a complicated list structure that would have to be converted somehow. Certainly many clumsy and ugly (by & do.call, multiple *apply function calls etc.) solutions exist but is there some really easy and elegant?

You can use <code>by</code> with <code>colMeans</code> <pre class="prettyprint"><code>by(df[,the_cols], df[,c('site', 'month')], FUN = colMeans) </code></pre> You can also use <code>ave</code> inside <code>lapply</code>: <pre class="prettyprint"><code>res <- lapply(df[,the_cols], function(x) ave(x, df[,c('site', 'month')], FUN = mean)) data.frame(res) # create data frame </code></pre>

R ave by columns

Tags:

dataframe

r

I want to use the ave function on many columns (tens) on the data frame:

ave(df[,the_cols], df[,c('site', 'month')], FUN = mean)

The problem is that ave runs the mean function on all the_cols columns together. Is there any way to run it for each of the_cols columns separately?

I tried to look at the other functions. tapply and aggregate are different, they return only one row per group. I need the ave behaviour, i.e. to return the same number of rows as given in the original df. There is also a by function, but using it would be very clumsy as it returns a complicated list structure that would have to be converted somehow.

Certainly many clumsy and ugly (by & do.call, multiple *apply function calls etc.) solutions exist but is there some really easy and elegant?

224

asked Jan 24 '14 19:01

Tomas

3 Answers

Perhaps I'm missing something, but an apply() approach here would work very well and wouldn't be ugly or require any ugly hacks. Some dummy data:

df <- data.frame(A = rnorm(20), B = rnorm(20), site = gl(5,4), month = gl(10, 2))

what is wrong with:

sapply(df[, c("A","B")], ave, df$site, df$month)

? Coerce that to a data frame via data.frame() if you really want that.

R> sapply(df[, c("A","B")], ave, df$site, df$month)
            A        B
 [1,]  0.0775  0.04845
 [2,]  0.0775  0.04845
 [3,] -1.5563  0.43443
 [4,] -1.5563  0.43443
 [5,]  0.7193  0.01151
 [6,]  0.7193  0.01151
 [7,] -0.9243 -0.28483
 [8,] -0.9243 -0.28483
 [9,]  0.3316  0.14473
[10,]  0.3316  0.14473
[11,] -0.2539  0.20384
[12,] -0.2539  0.20384
[13,]  0.5558 -0.37239
[14,]  0.5558 -0.37239
[15,]  0.1976 -0.22693
[16,]  0.1976 -0.22693
[17,]  0.2031  1.11041
[18,]  0.2031  1.11041
[19,]  0.3229 -0.53818
[20,]  0.3229 -0.53818

Putting it together a bit more, how about

AVE <- function(df, cols, ...) {
  dots <- list(...)
  out <- sapply(df[, cols], ave, ...)
  out <- data.frame(as.data.frame(dots), out)
  names(out) <- c(paste0("Fac", seq_along(dots)), cols)
  out
}

R> AVE(df, c("A","B"), df$site, df$month)
   Fac1 Fac2       A        B
1     1    1  0.0775  0.04845
2     1    1  0.0775  0.04845
3     1    2 -1.5563  0.43443
4     1    2 -1.5563  0.43443
5     2    3  0.7193  0.01151
6     2    3  0.7193  0.01151
7     2    4 -0.9243 -0.28483
8     2    4 -0.9243 -0.28483
9     3    5  0.3316  0.14473
10    3    5  0.3316  0.14473
11    3    6 -0.2539  0.20384
12    3    6 -0.2539  0.20384
13    4    7  0.5558 -0.37239
14    4    7  0.5558 -0.37239
15    4    8  0.1976 -0.22693
16    4    8  0.1976 -0.22693
17    5    9  0.2031  1.11041
18    5    9  0.2031  1.11041
19    5   10  0.3229 -0.53818
20    5   10  0.3229 -0.53818

The details of working with ... escape me at the moment, but you should be able to get better names for the Fac1 etc that I used here.

I'll throw an alternative representation out there for you: aggregate() but use the ave() function instead of mean():

R> aggregate(cbind(A, B) ~ site + month, data = df, ave)
   site month     A.1     A.2      B.1      B.2
1     1     1  0.0775  0.0775  0.04845  0.04845
2     1     2 -1.5563 -1.5563  0.43443  0.43443
3     2     3  0.7193  0.7193  0.01151  0.01151
4     2     4 -0.9243 -0.9243 -0.28483 -0.28483
5     3     5  0.3316  0.3316  0.14473  0.14473
6     3     6 -0.2539 -0.2539  0.20384  0.20384
7     4     7  0.5558  0.5558 -0.37239 -0.37239
8     4     8  0.1976  0.1976 -0.22693 -0.22693
9     5     9  0.2031  0.2031  1.11041  1.11041
10    5    10  0.3229  0.3229 -0.53818 -0.53818

Note quite the stated output, but it is something that is simple to reshape if needed.

117

answered Oct 13 '22 11:10

Gavin Simpson

If you want to have a data.frame back

library(plyr)
## assuming that the_cols are string
## if col index just add the index of site and month
the_cols <- c("site", "month", the_cols)
ddply(df, c('site', 'month'), FUN = numcolwise(mean))[,the_cols]

answered Oct 13 '22 13:10

dickoa

You can use by with colMeans

by(df[,the_cols], df[,c('site', 'month')], FUN = colMeans)

You can also use ave inside lapply:

res <- lapply(df[,the_cols], function(x) 
                               ave(x, df[,c('site', 'month')], FUN = mean))

data.frame(res) # create data frame

answered Oct 13 '22 12:10

Sven Hohenstein

Related questions
                            
                                replace list elements (avoid global assignment)
                            
                                Hashing function for mapping integers to a given range?
                            
                                How do I set width of candles in candle chart using plot.xts?
                            
                                "last name, first name" -> "first name last name" in serialized strings
                            
                                Conditionally remove rows from dataframe (more than one conditions)
                            
                                Blocking and waiting in R
                            
                                Add Regression Line ggplot for Only Certain Groups
                            
                                How to adjust the tile height in geom tile?
                            
                                Remove duplicated columns in matrix
                            
                                Attractive 3D plot in R
                            
                                Rewriting slow R function in C++ & Rcpp
                            
                                ggplot2 mapping county boundries in one color and state boundries in another on the same map
                            
                                create templates using ggplot2 syntax?
                            
                                Use of offset in lm regression - R
                            
                                Connect R and Vertica using RODBC
                            
                                Remove consecutive duplicate entries
                            
                                Make a boxplot without whiskers
                            
                                What is the fastest way to obtain frequencies of integers in a vector?
                            
                                Storing results of loop iterations in R
                            
                                converting numbers to time

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With