Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In a named argument to dplyr::funs, can I reference the names of other arguments?

Tags:

r

dplyr

rlang

Consider the following:

library(tidyverse)

df <- tibble(x = rnorm(100), y = rnorm(100, 10, 2), z = x * y)

df %>% 
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - mean(.)) / sd(.)))

Is there a way to avoid calling mean and sd twice by referencing the avg and dev columns. What I have in mind is something like

df %>% 
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - avg) / dev))

Clearly this won't work because there aren't columns avg and dev, but x_avg, x_dev, y_avg, y_dev, etc.

Is there a good way, within funs to use the rlang tools to create those column references programmatically, so that I can refer to columns created by the previous named arguments to funs (when . is x, I would reference x_mean and x_dev for calculating x_scaled, and so forth)?

like image 728
Jonathan Gilligan Avatar asked Nov 04 '18 17:11

Jonathan Gilligan


1 Answers

I think it would be easier if you convert your data to long format

library(tidyverse)

set.seed(111)
df <- tibble(x = rnorm(100), y = rnorm(100, 10, 2), z = x * y)

df %>% 
  gather(key, value) %>% 
  group_by(key) %>% 
  mutate(avg    = mean(value),
         sd     = sd(value),
         scaled = (value - avg) / sd)
#> # A tibble: 300 x 5
#> # Groups:   key [3]
#>    key    value     avg    sd scaled
#>    <chr>  <dbl>   <dbl> <dbl>  <dbl>
#>  1 x      0.235 -0.0128  1.07  0.232
#>  2 x     -0.331 -0.0128  1.07 -0.297
#>  3 x     -0.312 -0.0128  1.07 -0.279
#>  4 x     -2.30  -0.0128  1.07 -2.14 
#>  5 x     -0.171 -0.0128  1.07 -0.148
#>  6 x      0.140 -0.0128  1.07  0.143
#>  7 x     -1.50  -0.0128  1.07 -1.39 
#>  8 x     -1.01  -0.0128  1.07 -0.931
#>  9 x     -0.948 -0.0128  1.07 -0.874
#> 10 x     -0.494 -0.0128  1.07 -0.449
#> # ... with 290 more rows

Created on 2018-11-04 by the reprex package (v0.2.1.9000)

like image 94
Tung Avatar answered Oct 13 '22 01:10

Tung