I have been playing with dplyr::mutate_at
to create new variables by applying the same function to some of the columns. When I name my function in the .funs
argument, the mutate call creates new columns with a suffix instead of replacing the existing ones, which is a cool option that I discovered in this thread.
df = data.frame(var1=1:2, var2=4:5, other=9)
df %>% mutate_at(vars(contains("var")), .funs=funs('sqrt'=sqrt))
#### var1 var2 other var1_sqrt var2_sqrt
#### 1 1 4 9 1.000000 2.000000
#### 2 2 5 9 1.414214 2.236068
However, I noticed that when the vars
argument used to point my columns returns only one column instead of several, the resulting new column drops the initial name: it gets named sqrt
instead of other_sqrt
here:
df %>% mutate_at(vars(contains("other")), .funs=funs('sqrt'=sqrt))
#### var1 var2 other sqrt
#### 1 1 4 9 3
#### 2 2 5 9 3
I would like to understand why this behaviour happens, and how to avoid it because I don't know in advance how many columns the contains()
will return.
EDIT: The newly created columns must inherit the original name of the original columns, plus the suffix 'sqrt' at the end.
Thanks
Here is another idea. We can add setNames(sub("^sqrt$", "other_sqrt", names(.)))
after the mutate_at
call. The idea is to replace the column name sqrt
with other_sqrt
. The pattern ^sqrt$
should only match the derived column sqrt
if there is only one column named other
, which is demonstrated in Example 1. If there are more than one columns with other
, such as Example 2, the setNames
would not change the column names.
library(dplyr)
# Example 1
df <- data.frame(var1 = 1:2, var2 = 4:5, other = 9)
df %>%
mutate_at(vars(contains("other")), funs("sqrt" = sqrt(.))) %>%
setNames(sub("^sqrt$", "other_sqrt", names(.)))
# var1 var2 other other_sqrt
# 1 1 4 9 3
# 2 2 5 9 3
# Example 2
df2 <- data.frame(var1 = 1:2, var2 = 4:5, other1 = 9, other2 = 16)
df2 %>%
mutate_at(vars(contains("other")), funs("sqrt" = sqrt(.))) %>%
setNames(sub("^sqrt$", "other_sqrt", names(.)))
# var1 var2 other1 other2 other1_sqrt other2_sqrt
# 1 1 4 9 16 3 4
# 2 2 5 9 16 3 4
Or we can design a function to check how many columns contain the string other
before manipulating the data frame.
mutate_sqrt <- function(df, string){
string_col <- grep(string, names(df), value = TRUE)
df2 <- df %>% mutate_at(vars(contains(string)), funs("sqrt" = sqrt(.)))
if (length(string_col) == 1){
df2 <- df2 %>% setNames(sub("^sqrt$", paste(string_col, "sqrt", sep = "_"), names(.)))
}
return(df2)
}
mutate_sqrt(df, "other")
# var1 var2 other other_sqrt
# 1 1 4 9 3
# 2 2 5 9 3
mutate_sqrt(df2, "other")
# var1 var2 other1 other2 other1_sqrt other2_sqrt
# 1 1 4 9 16 3 4
# 2 2 5 9 16 3 4
I just figured out a (not so clean) way to do it; I add a extra dummy variable to the dataset, with a name that ensures that it will be selected and that we don't fall into the 1-variable case, and after the calculation I remove the 2 dummies, like this:
df %>% mutate(other_fake=NA) %>%
mutate_at(vars(contains("other")), .funs=funs('sqrt'=sqrt)) %>%
select(-contains("other_fake"))
#### var1 var2 other other_sqrt
#### 1 1 4 9 3
#### 2 2 5 9 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With