I have data set like this <pre class="prettyprint"><code>data name v1 v2 v3 v4 v5 a 1 2 7 9 3 b 3 8 6 4 8 c 2 5 0 1 9 a 6 0 6 2 1 c 3 9 4 7 5 </code></pre> <code>name</code> is a factor variable. I want to calculate mean of <code>v2,v3,v4,v5</code> by the factor <code>data$name</code>. I used following command, but it did not work. <pre class="prettyprint"><code>tapply(data[,3:6],data$name,mean) </code></pre> Now, I used following code <pre class="prettyprint"><code>newdata<-0 for (name in unique(data$name)){ rowIndex <- which(data$name == name) result <- colMeans(data[rowIndex,]) newdata[name,]<-result } </code></pre> The required result is obtained. But I want to know if there is some sleek method to do this.

This can be done with a combination of the dplyr and tidyr packages: <pre class="prettyprint"><code>library(dplyr) library(tidyr) data %>% gather(name, value, v2:v5) %>% group_by(name) %>% summarize(average=mean(value)) # name average # 1 a 3.75 # 2 b 6.50 # 3 c 5.00 </code></pre> <hr> This works because <code>gather</code> brings the <code>v2:v5</code> columns together into a single column where they can be intuitively grouped: <pre class="prettyprint"><code>data %>% gather(name, value, v2:v5) # name v1 name value # 1 a 1 v2 2 # 2 b 3 v2 8 # 3 c 2 v2 5 # 4 a 6 v2 0 # 5 c 3 v2 9 # 6 a 1 v3 7 # ... </code></pre>

R column mean by factor

Tags:

r

I have data set like this

data
name v1  v2  v3  v4  v5
a    1   2   7   9   3
b    3   8   6   4   8
c    2   5   0   1   9
a    6   0   6   2   1
c    3   9   4   7   5

name is a factor variable. I want to calculate mean of v2,v3,v4,v5 by the factor data$name. I used following command, but it did not work.

tapply(data[,3:6],data$name,mean)

Now, I used following code

newdata<-0
for (name in unique(data$name)){
    rowIndex <- which(data$name == name)
    result <- colMeans(data[rowIndex,])
    newdata[name,]<-result
}

The required result is obtained. But I want to know if there is some sleek method to do this.

972

asked Sep 18 '14 09:09

Prabhu

2 Answers

Here's another way

library(data.table)
cols <- paste0("v", 2:5) # set the columns you want to operate on
setDT(data)[, Sums := rowSums(.SD), .SDcols = cols]
data[, list(Means = sum(Sums)/(.N*length(cols))), by = name]
##    name Means
## 1:    a  3.75
## 2:    b  6.50
## 3:    c  5.00

Edit

Per @Aruns suggestion, that would be probably much better

setDT(data)[, mean(c(v2,v3,v4,v5)), by=name]
##    name   V1
## 1:    a 3.75
## 2:    b 6.50
## 3:    c 5.00

Or per @Anandas suggestion

library(reshape2)
melt(setDT(data), id.vars = "name", measure.vars = cols)[, mean(value), by = name]
##    name   V1
## 1:    a 3.75
## 2:    b 6.50
## 3:    c 5.00

answered Oct 13 '22 05:10

David Arenburg

This can be done with a combination of the dplyr and tidyr packages:

library(dplyr)
library(tidyr)

data %>% gather(name, value, v2:v5) %>%
    group_by(name) %>% summarize(average=mean(value))
#   name average
# 1    a    3.75
# 2    b    6.50
# 3    c    5.00

This works because gather brings the v2:v5 columns together into a single column where they can be intuitively grouped:

data %>% gather(name, value, v2:v5)
#    name v1 name value
# 1     a  1   v2     2
# 2     b  3   v2     8
# 3     c  2   v2     5
# 4     a  6   v2     0
# 5     c  3   v2     9
# 6     a  1   v3     7
# ...

answered Oct 13 '22 05:10

David Robinson

Related questions
                            
                                Converting matrix to dataframe : Works in one case, not another
                            
                                Multiplying vector combinations
                            
                                R package, Caret RFE function, how to customize metric to use AUC?
                            
                                R matching more than 2 conditions and return the response value
                            
                                How to hide x-axis in lattice R
                            
                                Mean of each element of a list of matrices
                            
                                R - use rbind on multiple variables with similar names
                            
                                R : confidence interval being partially displayed with ggplot2 (using geom_smooth())
                            
                                Calculate row sum but exclude a column in R
                            
                                print backslash in R strings
                            
                                Using Conditional Statements to Change the Color of Data Points
                            
                                In R, split a character vector by a specific character; save 3rd piece in new vector
                            
                                Creating 'Top 10' lists in R
                            
                                R: Merge data.table and fill in NAs
                            
                                The diag() function in R
                            
                                How to change the character encoding of .R file in RStudio?
                            
                                R: Raster mosaic from list of rasters?
                            
                                How do I extract multiple character strings from one line using R
                            
                                Using Python to parse a 12GB CSV
                            
                                R X-axis Date Labels using plot()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With