I have data set like this
data
name v1 v2 v3 v4 v5
a 1 2 7 9 3
b 3 8 6 4 8
c 2 5 0 1 9
a 6 0 6 2 1
c 3 9 4 7 5
name
is a factor variable. I want to calculate mean of v2,v3,v4,v5
by the factor data$name
. I used following command, but it did not work.
tapply(data[,3:6],data$name,mean)
Now, I used following code
newdata<-0
for (name in unique(data$name)){
rowIndex <- which(data$name == name)
result <- colMeans(data[rowIndex,])
newdata[name,]<-result
}
The required result is obtained. But I want to know if there is some sleek method to do this.
There is a predefined function available in R called mean() function which can be used to calculate the mean of all the variables in a dataset. There are different syntaxes available to calculate the mean of a variable in a dataset which are as follows, mean(df) mean(df, trim = 0.1)
Factor in R is a variable used to categorize and store the data, having a limited number of different values. It stores the data as a vector of integer values. Factor in R is also known as a categorical variable that stores both string and integer data values as levels.
Mean of a Dataset The mean, or average, of a dataset is calculated by adding all the values in the dataset and then dividing by the number of values in the set. For example, for the dataset [1,2,3] , the mean is 1+2+3 / 3 = 2 .
Here's another way
library(data.table)
cols <- paste0("v", 2:5) # set the columns you want to operate on
setDT(data)[, Sums := rowSums(.SD), .SDcols = cols]
data[, list(Means = sum(Sums)/(.N*length(cols))), by = name]
## name Means
## 1: a 3.75
## 2: b 6.50
## 3: c 5.00
Edit
Per @Aruns suggestion, that would be probably much better
setDT(data)[, mean(c(v2,v3,v4,v5)), by=name]
## name V1
## 1: a 3.75
## 2: b 6.50
## 3: c 5.00
Or per @Anandas suggestion
library(reshape2)
melt(setDT(data), id.vars = "name", measure.vars = cols)[, mean(value), by = name]
## name V1
## 1: a 3.75
## 2: b 6.50
## 3: c 5.00
This can be done with a combination of the dplyr and tidyr packages:
library(dplyr)
library(tidyr)
data %>% gather(name, value, v2:v5) %>%
group_by(name) %>% summarize(average=mean(value))
# name average
# 1 a 3.75
# 2 b 6.50
# 3 c 5.00
This works because gather
brings the v2:v5
columns together into a single column where they can be intuitively grouped:
data %>% gather(name, value, v2:v5)
# name v1 name value
# 1 a 1 v2 2
# 2 b 3 v2 8
# 3 c 2 v2 5
# 4 a 6 v2 0
# 5 c 3 v2 9
# 6 a 1 v3 7
# ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With