I have a dataframe with hundreads of columns. Just for example purposes I'm going to present a toy dataframe.
TPT_A_2 | TPT_B_2 | TPT_C_2 | TPT_A_4 | TPT_B_4 | TPT_C_4 | TPT_A_6 | TPT_B_6 | TPT_C_6 |
100 100 100 200 200 200 400 400 400
I want to compute the mean for those variables with the same initial substrings as name (TPT_A, TPT_B..) that end with 2 and 4. So I would get something like:
TPT_A_mean | TPT_B_mean | TPT_C_mean | TPT_A_6 | TPT_B_6 | TPT_C_6 |
150 150 150 400 400 400
This data would be:
row1 <- c("TPT_A_2", "TPT_B_2", "TPT_C_2","TPT_A_4", "TPT_B_4", "TPT_C_4", "TPT_A_6", "TPT_B_6", "TPT_C_6")
row2 <- c(100, 100, 100, 200, 200, 200, 400, 40, 400)
data <- as.data.frame(rbind(row1, row2))
colnames(data) <- as.character(data[1,])
data <- data[-1,]
First, your method for generating a frame is an anti-pattern, resulting in your numbers being converted to strings.
str(dat)
# 'data.frame': 1 obs. of 9 variables:
# $ TPT_A_2: chr "100"
# $ TPT_B_2: chr "100"
# $ TPT_C_2: chr "100"
# $ TPT_A_4: chr "200"
# $ TPT_B_4: chr "200"
# $ TPT_C_4: chr "200"
# $ TPT_A_6: chr "400"
# $ TPT_B_6: chr "40"
# $ TPT_C_6: chr "400"
In this case, we can use:
row1 <- c("TPT_A_2", "TPT_B_2", "TPT_C_2","TPT_A_4", "TPT_B_4", "TPT_C_4", "TPT_A_6", "TPT_B_6", "TPT_C_6")
row2 <- c(100, 100, 100, 200, 200, 200, 400, 40, 400)
dat <- as.data.frame(setNames(as.list(row2),row1))
str(dat)
# 'data.frame': 1 obs. of 9 variables:
# $ TPT_A_2: num 100
# $ TPT_B_2: num 100
# $ TPT_C_2: num 100
# $ TPT_A_4: num 200
# $ TPT_B_4: num 200
# $ TPT_C_4: num 200
# $ TPT_A_6: num 400
# $ TPT_B_6: num 40
# $ TPT_C_6: num 400
From here ...
dat2a <- subset(dat, select = grepl("TPT_[ABC]_[24]", colnames(dat)))
dat2b <- subset(dat, select = !grepl("TPT_[ABC]_[24]", colnames(dat)))
cbind(
dat2b,
lapply(split.default(dat2a, gsub("_[24]$", "", colnames(dat2a))),
function(z) mean(unlist(z)))
)
# TPT_A_6 TPT_B_6 TPT_C_6 TPT_A TPT_B TPT_C
# 1 400 40 400 150 150 150
library(dplyr)
library(purrr) # imap
dat %>%
split.default(., gsub("_[24]$", "", colnames(.))) %>%
imap(., function(x, nm) {
if (ncol(x) > 1) {
setNames(data.frame(mean(unlist(x))), paste0(nm, "_mean"))
} else x
}) %>%
bind_cols()
# TPT_A_mean TPT_A_6 TPT_B_mean TPT_B_6 TPT_C_mean TPT_C_6
# 1 150 400 150 40 150 400
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With