Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mutate_at does not create variable suffixes in some cases?

Tags:

contains

r

dplyr

I have been playing with dplyr::mutate_at to create new variables by applying the same function to some of the columns. When I name my function in the .funs argument, the mutate call creates new columns with a suffix instead of replacing the existing ones, which is a cool option that I discovered in this thread.

df = data.frame(var1=1:2, var2=4:5, other=9)
df %>% mutate_at(vars(contains("var")), .funs=funs('sqrt'=sqrt))
####   var1 var2 other var1_sqrt var2_sqrt
#### 1    1    4     9  1.000000  2.000000
#### 2    2    5     9  1.414214  2.236068

However, I noticed that when the vars argument used to point my columns returns only one column instead of several, the resulting new column drops the initial name: it gets named sqrt instead of other_sqrt here:

df %>% mutate_at(vars(contains("other")), .funs=funs('sqrt'=sqrt))
####   var1 var2 other sqrt
#### 1    1    4     9    3
#### 2    2    5     9    3

I would like to understand why this behaviour happens, and how to avoid it because I don't know in advance how many columns the contains() will return.

EDIT: The newly created columns must inherit the original name of the original columns, plus the suffix 'sqrt' at the end.

Thanks

like image 430
agenis Avatar asked Feb 04 '18 22:02

agenis


2 Answers

Here is another idea. We can add setNames(sub("^sqrt$", "other_sqrt", names(.))) after the mutate_at call. The idea is to replace the column name sqrt with other_sqrt. The pattern ^sqrt$ should only match the derived column sqrt if there is only one column named other, which is demonstrated in Example 1. If there are more than one columns with other, such as Example 2, the setNames would not change the column names.

library(dplyr)

# Example 1
df <- data.frame(var1 = 1:2, var2 = 4:5, other = 9)

df %>% 
  mutate_at(vars(contains("other")), funs("sqrt" = sqrt(.))) %>%
  setNames(sub("^sqrt$", "other_sqrt", names(.)))
#   var1 var2 other other_sqrt
# 1    1    4     9          3
# 2    2    5     9          3

# Example 2
df2 <- data.frame(var1 = 1:2, var2 = 4:5, other1 = 9, other2 = 16)

df2 %>% 
  mutate_at(vars(contains("other")), funs("sqrt" = sqrt(.))) %>%
  setNames(sub("^sqrt$", "other_sqrt", names(.)))
#   var1 var2 other1 other2 other1_sqrt other2_sqrt
# 1    1    4      9     16           3           4
# 2    2    5      9     16           3           4

Or we can design a function to check how many columns contain the string other before manipulating the data frame.

mutate_sqrt <- function(df, string){
  string_col <- grep(string, names(df), value = TRUE)
  df2 <- df %>% mutate_at(vars(contains(string)), funs("sqrt" = sqrt(.)))
  if (length(string_col) == 1){
    df2 <- df2 %>%  setNames(sub("^sqrt$", paste(string_col, "sqrt", sep = "_"), names(.)))
  }
  return(df2)
}

mutate_sqrt(df, "other")
#   var1 var2 other other_sqrt
# 1    1    4     9          3
# 2    2    5     9          3

mutate_sqrt(df2, "other")
#   var1 var2 other1 other2 other1_sqrt other2_sqrt
# 1    1    4      9     16           3           4
# 2    2    5      9     16           3           4 
like image 111
www Avatar answered Oct 07 '22 01:10

www


I just figured out a (not so clean) way to do it; I add a extra dummy variable to the dataset, with a name that ensures that it will be selected and that we don't fall into the 1-variable case, and after the calculation I remove the 2 dummies, like this:

df %>% mutate(other_fake=NA) %>% 
  mutate_at(vars(contains("other")), .funs=funs('sqrt'=sqrt)) %>% 
  select(-contains("other_fake"))
####   var1 var2 other other_sqrt
#### 1    1    4     9          3
#### 2    2    5     9          3
like image 31
agenis Avatar answered Oct 06 '22 23:10

agenis