I'd like to use dplyr's mutate_at
function to apply a function to several columns in a dataframe, where the function inputs the column to which it is directly applied as well as another column in the dataframe.
As a concrete example, I'd look to mutate the following dataframe
# Example input dataframe df <- data.frame( x = c(TRUE, TRUE, FALSE), y = c("Hello", "Hola", "Ciao"), z = c("World", "ao", "HaOlam") )
with a mutate_at
call that looks similar to this
df %>% mutate_at(.vars = vars(y, z), .funs = ifelse(x, ., NA))
to return a dataframe that looks something like this
# Desired output dataframe df2 <- data.frame(x = c(TRUE, TRUE, FALSE), y_1 = c("Hello", "Hola", NA), z_1 = c("World", "ao", NA))
The desired mutate_at
call would be similar to the following call to mutate
:
df %>% mutate(y_1 = ifelse(x, y, NA), z_1 = ifelse(x, z, NA))
I know that this can be done in base R in several ways, but I would specifically like to accomplish this goal using dplyr's mutate_at
function for the sake of readability, interfacing with databases, etc.
Below are some similar questions asked on stackoverflow which do not address the question I posed here:
adding multiple columns in a dplyr mutate call
dplyr::mutate to add multiple values
Use of column inside sum() function using dplyr's mutate() function
The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).
How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.
Use the lapply() Function to Convert Multiple Columns From Integer to Numeric Type in R. Base R's lapply() function allows us to apply a function to elements of a list. We will apply the as. numeric() function.
This was answered by @eipi10 in @eipi10's comment on the question, but I'm writing it here for posterity.
The solution here is to use:
df %>% mutate_at(.vars = vars(y, z), .funs = list(~ ifelse(x, ., NA)))
You can also use the new across()
function with mutate()
, like so:
df %>% mutate(across(c(y, z), ~ ifelse(x, ., NA)))
The use of the formula operator (as in ~ ifelse(...)
) here indicates that ifelse(x, ., NA)
is an anonymous function that is being defined within the call to mutate_at()
.
This works similarly to defining the function outside of the call to mutate_at()
, like so:
temp_fn <- function(input) ifelse(test = df[["x"]], yes = input, no = NA) df %>% mutate_at(.vars = vars(y, z), .funs = temp_fn)
Note on syntax changes in dplyr: Prior to dplyr version 0.8.0, you would simply write .funs = funs(ifelse(x, . , NA))
, but the funs()
function is being deprecated and will soon be removed from dplyr.
To supplement the previous response, if you wanted mutate_at()
to add new variables (instead of replacing), with names such as z_1
and y_1
as in the original question, you just need to:
across()
: add .names="{.col}_1"
, or alternatively use list('1'=~ifelse(x, ., NA)
(back ticks!)list('1'=~ifelse(x, ., NA)
funs('1'=ifelse(x, ., NA)
library(tidyverse) df <- data.frame( x = c(TRUE, TRUE, FALSE), y = c("Hello", "Hola", "Ciao"), z = c("World", "ao", "HaOlam") ) ## Version >=1 df %>% mutate(across(c(y, z), list(~ifelse(x, ., NA)), .names="{.col}_1")) #> x y z y_1 z_1 #> 1 TRUE Hello World Hello World #> 2 TRUE Hola ao Hola ao #> 3 FALSE Ciao HaOlam <NA> <NA> ## 0.8 - <1 df %>% mutate_at(.vars = vars(y, z), .funs = list(`1`=~ifelse(x, ., NA))) #> x y z y_1 z_1 #> 1 TRUE Hello World Hello World #> 2 TRUE Hola ao Hola ao #> 3 FALSE Ciao HaOlam <NA> <NA> ## Before 0.8 df %>% mutate_at(.vars = vars(y, z), .funs = funs(`1`=ifelse(x, ., NA))) #> Warning: `funs()` is deprecated as of dplyr 0.8.0. #> Please use a list of either functions or lambdas: #> #> # Simple named list: #> list(mean = mean, median = median) #> #> # Auto named with `tibble::lst()`: #> tibble::lst(mean, median) #> #> # Using lambdas #> list(~ mean(., trim = .2), ~ median(., na.rm = TRUE)) #> This warning is displayed once every 8 hours. #> Call `lifecycle::last_warnings()` to see where this warning was generated. #> x y z y_1 z_1 #> 1 TRUE Hello World Hello World #> 2 TRUE Hola ao Hola ao #> 3 FALSE Ciao HaOlam <NA> <NA>
Created on 2020-10-03 by the reprex package (v0.3.0)
For more details and tricks, see: Create new variables with mutate_at while keeping the original ones
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With