I have a large dataframe and want to standardise multiple columns while conditioning the mean and the standard deviation on values. Say I have the following example data:
set.seed(123)
df = data.frame("sample" = c(rep(1:2, each = 5)),
"status" = c(0,1),
"s1" = runif(10, -1, 1),
"s2" = runif(10, -5, 5),
"s3" = runif(10, -25, 25))
and want to standardise every s1-s3 while conditioning the mean and standard deviation to be status==0. If I should do this for say, s1 only I could do the following:
df = df %>% group_by(sample) %>%
mutate(sd_s1 = (s1 - mean(s1[status==0])) / sd(s1[status==0]))
But my problem arises when I have to perform this operation on multiple columns. I tried writing a function to include with mutate_at:
standardize <- function(x) {
return((x - mean(x[status==0]))/sd(x[status==0]))
}
df = df %>% group_by(sample) %>%
mutate_at(vars(s1:s3), standardize)
Which just creates Na values for s1-s3. I have tried to use the answer provided in: R - dplyr - mutate - use dynamic variable names, but cannot figure out how to do the subsetting.
Any help is greatly appreciated. Thanks!
We could just use
df %>%
group_by(sample) %>%
mutate_at(vars(s1:s3), funs((.- mean(.[status == 0]))/sd(.[status == 0])))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With