Consider the following:
library(tidyverse)
df <- tibble(x = rnorm(100), y = rnorm(100, 10, 2), z = x * y)
df %>%
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - mean(.)) / sd(.)))
Is there a way to avoid calling mean
and sd
twice by referencing the avg
and dev
columns. What I have in mind is something like
df %>%
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - avg) / dev))
Clearly this won't work because there aren't columns avg
and dev
, but x_avg
, x_dev
, y_avg
, y_dev
, etc.
Is there a good way, within funs
to use the rlang
tools to create those column references programmatically, so that I can refer to columns created by the previous named arguments to funs
(when .
is x
, I would reference x_mean
and x_dev
for calculating x_scaled
, and so forth)?
I think it would be easier if you convert your data to long format
library(tidyverse)
set.seed(111)
df <- tibble(x = rnorm(100), y = rnorm(100, 10, 2), z = x * y)
df %>%
gather(key, value) %>%
group_by(key) %>%
mutate(avg = mean(value),
sd = sd(value),
scaled = (value - avg) / sd)
#> # A tibble: 300 x 5
#> # Groups: key [3]
#> key value avg sd scaled
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 x 0.235 -0.0128 1.07 0.232
#> 2 x -0.331 -0.0128 1.07 -0.297
#> 3 x -0.312 -0.0128 1.07 -0.279
#> 4 x -2.30 -0.0128 1.07 -2.14
#> 5 x -0.171 -0.0128 1.07 -0.148
#> 6 x 0.140 -0.0128 1.07 0.143
#> 7 x -1.50 -0.0128 1.07 -1.39
#> 8 x -1.01 -0.0128 1.07 -0.931
#> 9 x -0.948 -0.0128 1.07 -0.874
#> 10 x -0.494 -0.0128 1.07 -0.449
#> # ... with 290 more rows
Created on 2018-11-04 by the reprex package (v0.2.1.9000)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With