I have a data frame consisting of 10299 observations of 66 variables. Some of these variables share a common column name, and I would like to calculate the mean of those variables for each observation.
Having the following matrix, with column names c(A, B, C, B, A ,C)
:
A B C B A C
1 2 3 4 5 6
3 5 6 7 4 3
3 3 3 3 5 5
2 2 2 2 2 2
I would like to get:
A B C
3 3 4.5
3.5 6 4.5
4 3 4
2 2 2
I tried for loops, the command aggregate()
but I don't get the desired result.
Sorry if the question seems too basic, I have checked google for possible solutions but I didn't find any.
Here's a solution.
First let's define an exemplary data.frame (the same as in your example).
df <- as.data.frame(
matrix(c(1,3,3,2,2,5,3,2,3,6,3,2,4,7,3,2,5,4,5,2,6,3,5,2),
ncol=6,
dimnames=list(NULL, c("A", "B", "C", "B", "A", "C"))
)
)
Below we apply a custom function on each unique column name, col
:
it selects all the columns named col
and calculates rowMeans
. The result, list of atomic vectors, will be coerced to a data.frame:
res <- as.data.frame( # sapply returns a list here, so we convert it to a data.frame
sapply(unique(names(df)), # for each unique column name
function(col) rowMeans(df[names(df) == col]) # calculate row means
)
)
The result:
res
## A B C
## 1 3.0 3 4.5
## 2 3.5 6 4.5
## 3 4.0 3 4.0
## 4 2.0 2 2.0
EDIT: As there are many solutions proposed already, let's benchmark them:
set.seed(123)
df <- as.data.frame(matrix(sample(1:9, replace=TRUE, 10000*100),
dimnames=list(NULL, sample(LETTERS[1:5], 100, replace=TRUE)), ncol=100))
library(microbenchmark)
microbenchmark(...)
## Unit: milliseconds
## min lq median uq max neval
## @gagolews 61.196075 65.73211 77.22533 119.42028 127.32557 10
## @joran 8.297964 10.05242 10.90564 15.25943 65.69156 10
## @Davide 5535.272680 5731.24220 5754.67006 5808.47807 5862.22628 10
The clear winner (at least as far as speed is concerned) is @joran's lapply
+split
+Reduce
. Congrats! :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With