I can't imagine I'm the first person with this question, but I haven't found a solution yet (here or elsewhere).
I have a few columns, which I want to average in R. The only minimally tricky aspect is that some columns contain NAs.
For example:
Trait Col1 Col2 Col3 DF 23 NA 23 DG 2 2 2 DH NA 9 9
I want to create a Col4 that averages the entries in the first 3 columns, ignoring the NAs. So:
Trait Col1 Col2 Col3 Col4 DF 23 NA 23 23 DG 2 2 2 2 DH NA 9 9 9
Ideally something like this would work:
data$Col4 <- mean(data$Chr1, data$Chr2, data$Chr3, na.rm=TRUE)
but it doesn't.
To find the mean of multiple columns based on multiple grouping columns in R data frame, we can use summarise_at function with mean function.
To replace NA with 0 in an R data frame, use is.na() function and then select all those values with NA and assign them to 0. myDataframe is the data frame in which you would like replace all NAs with 0.
You want rowMeans()
but importantly note it has a na.rm
argument that you want to set to TRUE
. E.g.:
> mat <- matrix(c(23,2,NA,NA,2,9,23,2,9), ncol = 3) > mat [,1] [,2] [,3] [1,] 23 NA 23 [2,] 2 2 2 [3,] NA 9 9 > rowMeans(mat) [1] NA 2 NA > rowMeans(mat, na.rm = TRUE) [1] 23 2 9
To match your example:
> dat <- data.frame(Trait = c("DF","DG","DH"), mat) > names(dat) <- c("Trait", paste0("Col", 1:3)) > dat Trait Col1 Col2 Col3 1 DF 23 NA 23 2 DG 2 2 2 3 DH NA 9 9 > dat <- transform(dat, Col4 = rowMeans(dat[,-1], na.rm = TRUE)) > dat Trait Col1 Col2 Col3 Col4 1 DF 23 NA 23 23 2 DG 2 2 2 2 3 DH NA 9 9 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With