Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using group_by with mutate_if by column name

Tags:

r

dplyr

tidyverse

I am trying to use mutate_if to perform calculations based on the variable name. For example, if the variable names contains "demo" calculate the mean, and if the name contains "meas" calculate the median:

library(tidyverse)
library(stringr)

exm_data <- data_frame(
  group = sample(letters[1:5], size = 50, replace = TRUE),
  demo_age = rnorm(50),
  demo_height = runif(50, min = 48, max = 80),
  meas_score1 = rnorm(50),
  meas_score2 = rnorm(50)
)
exm_data
#> # A tibble: 50 x 5
#>    group    demo_age demo_height  meas_score1 meas_score2
#>    <chr>       <dbl>       <dbl>        <dbl>       <dbl>
#>  1     a -1.46539563    58.22435 -0.760692567   0.1077901
#>  2     b  1.90983770    56.57976  0.262933462  -1.0186600
#>  3     c  0.58502114    66.26322  2.283491647   0.3215542
#>  4     b -0.97228337    74.82932  2.447551824  -0.4763201
#>  5     a  0.65814161    72.19627 -0.592671739  -0.0521247
#>  6     c -0.62133706    75.49976  0.005813255  -0.4195284
#>  7     b  0.40650836    60.99083  0.809183477  -0.1127530
#>  8     c -0.48251421    50.94077 -1.171749420   1.7268231
#>  9     b  1.24476630    71.39803  1.786950340   0.7980217
#> 10     c -0.09704469    69.52001 -0.511872217  -1.1465523
#> # ... with 40 more rows


exm_data %>%
  mutate_if(str_detect(colnames(.), "demo"), mean) %>%
  mutate_if(str_detect(colnames(.), "meas"), median)
#> # A tibble: 50 x 5
#>    group    demo_age demo_height meas_score1 meas_score2
#>    <chr>       <dbl>       <dbl>       <dbl>       <dbl>
#>  1     a -0.03250753    64.31412 -0.09909911   0.1307904
#>  2     b -0.03250753    64.31412 -0.09909911   0.1307904
#>  3     c -0.03250753    64.31412 -0.09909911   0.1307904
#>  4     b -0.03250753    64.31412 -0.09909911   0.1307904
#>  5     a -0.03250753    64.31412 -0.09909911   0.1307904
#>  6     c -0.03250753    64.31412 -0.09909911   0.1307904
#>  7     b -0.03250753    64.31412 -0.09909911   0.1307904
#>  8     c -0.03250753    64.31412 -0.09909911   0.1307904
#>  9     b -0.03250753    64.31412 -0.09909911   0.1307904
#> 10     c -0.03250753    64.31412 -0.09909911   0.1307904
#> # ... with 40 more rows

As you can see, this work as expected. However, I want to do these calculations by group, and when I add the group_by statement it breaks:

exm_data %>%
  group_by(group) %>%
  mutate_if(str_detect(colnames(.), "demo"), mean) %>%
  mutate_if(str_detect(colnames(.), "meas"), median)
#> Error: length(.p) == length(vars) is not TRUE

Is there a way to use mutate_if on a grouped tibble using column names?

like image 233
Jake Thompson Avatar asked Oct 06 '17 13:10

Jake Thompson


1 Answers

You can use mutate_at along with contains from dplyr as follows,

library(dplyr)

 exm_data %>% 
  group_by(group) %>% 
  mutate_at(vars(contains('demo')), funs(mean)) %>% 
  mutate_at(vars(contains('meas')), funs(median))

which gives,

# A tibble: 50 x 5
# Groups:   group [5]
   group    demo_age demo_height meas_score1 meas_score2
   <chr>       <dbl>       <dbl>       <dbl>       <dbl>
 1     d  0.12916082    60.26550   0.1932882  -0.5356818
 2     b -0.31142894    64.50839   0.3219514  -0.4777860
 3     b -0.31142894    64.50839   0.3219514  -0.4777860
 4     a -0.34373403    64.84180   0.1929516  -0.3821047
 5     a -0.34373403    64.84180   0.1929516  -0.3821047
 6     b -0.31142894    64.50839   0.3219514  -0.4777860
 7     d  0.12916082    60.26550   0.1932882  -0.5356818
 8     a -0.34373403    64.84180   0.1929516  -0.3821047
 9     d  0.12916082    60.26550   0.1932882  -0.5356818
10     c -0.05963747    59.07845  -0.2395409  -0.4484245

BONUS You don't need to load stringr

like image 136
Sotos Avatar answered Nov 14 '22 23:11

Sotos