I am trying to use mutate_if
to perform calculations based on the variable name. For example, if the variable names contains "demo" calculate the mean, and if the name contains "meas" calculate the median:
library(tidyverse)
library(stringr)
exm_data <- data_frame(
group = sample(letters[1:5], size = 50, replace = TRUE),
demo_age = rnorm(50),
demo_height = runif(50, min = 48, max = 80),
meas_score1 = rnorm(50),
meas_score2 = rnorm(50)
)
exm_data
#> # A tibble: 50 x 5
#> group demo_age demo_height meas_score1 meas_score2
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a -1.46539563 58.22435 -0.760692567 0.1077901
#> 2 b 1.90983770 56.57976 0.262933462 -1.0186600
#> 3 c 0.58502114 66.26322 2.283491647 0.3215542
#> 4 b -0.97228337 74.82932 2.447551824 -0.4763201
#> 5 a 0.65814161 72.19627 -0.592671739 -0.0521247
#> 6 c -0.62133706 75.49976 0.005813255 -0.4195284
#> 7 b 0.40650836 60.99083 0.809183477 -0.1127530
#> 8 c -0.48251421 50.94077 -1.171749420 1.7268231
#> 9 b 1.24476630 71.39803 1.786950340 0.7980217
#> 10 c -0.09704469 69.52001 -0.511872217 -1.1465523
#> # ... with 40 more rows
exm_data %>%
mutate_if(str_detect(colnames(.), "demo"), mean) %>%
mutate_if(str_detect(colnames(.), "meas"), median)
#> # A tibble: 50 x 5
#> group demo_age demo_height meas_score1 meas_score2
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a -0.03250753 64.31412 -0.09909911 0.1307904
#> 2 b -0.03250753 64.31412 -0.09909911 0.1307904
#> 3 c -0.03250753 64.31412 -0.09909911 0.1307904
#> 4 b -0.03250753 64.31412 -0.09909911 0.1307904
#> 5 a -0.03250753 64.31412 -0.09909911 0.1307904
#> 6 c -0.03250753 64.31412 -0.09909911 0.1307904
#> 7 b -0.03250753 64.31412 -0.09909911 0.1307904
#> 8 c -0.03250753 64.31412 -0.09909911 0.1307904
#> 9 b -0.03250753 64.31412 -0.09909911 0.1307904
#> 10 c -0.03250753 64.31412 -0.09909911 0.1307904
#> # ... with 40 more rows
As you can see, this work as expected. However, I want to do these calculations by group, and when I add the group_by
statement it breaks:
exm_data %>%
group_by(group) %>%
mutate_if(str_detect(colnames(.), "demo"), mean) %>%
mutate_if(str_detect(colnames(.), "meas"), median)
#> Error: length(.p) == length(vars) is not TRUE
Is there a way to use mutate_if
on a grouped tibble using column names?
You can use mutate_at
along with contains
from dplyr
as follows,
library(dplyr)
exm_data %>%
group_by(group) %>%
mutate_at(vars(contains('demo')), funs(mean)) %>%
mutate_at(vars(contains('meas')), funs(median))
which gives,
# A tibble: 50 x 5 # Groups: group [5] group demo_age demo_height meas_score1 meas_score2 <chr> <dbl> <dbl> <dbl> <dbl> 1 d 0.12916082 60.26550 0.1932882 -0.5356818 2 b -0.31142894 64.50839 0.3219514 -0.4777860 3 b -0.31142894 64.50839 0.3219514 -0.4777860 4 a -0.34373403 64.84180 0.1929516 -0.3821047 5 a -0.34373403 64.84180 0.1929516 -0.3821047 6 b -0.31142894 64.50839 0.3219514 -0.4777860 7 d 0.12916082 60.26550 0.1932882 -0.5356818 8 a -0.34373403 64.84180 0.1929516 -0.3821047 9 d 0.12916082 60.26550 0.1932882 -0.5356818 10 c -0.05963747 59.07845 -0.2395409 -0.4484245
BONUS You don't need to load stringr
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With