Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using functions of multiple columns in a dplyr mutate_at call

Tags:

r

dplyr

I'd like to use dplyr's mutate_at function to apply a function to several columns in a dataframe, where the function inputs the column to which it is directly applied as well as another column in the dataframe.

As a concrete example, I'd look to mutate the following dataframe

# Example input dataframe df <- data.frame(     x = c(TRUE, TRUE, FALSE),     y = c("Hello", "Hola", "Ciao"),     z = c("World", "ao", "HaOlam") ) 

with a mutate_at call that looks similar to this

df %>% mutate_at(.vars = vars(y, z),           .funs = ifelse(x, ., NA)) 

to return a dataframe that looks something like this

# Desired output dataframe df2 <- data.frame(x = c(TRUE, TRUE, FALSE),                   y_1 = c("Hello", "Hola", NA),                   z_1 = c("World", "ao", NA)) 

The desired mutate_at call would be similar to the following call to mutate:

df %>%    mutate(y_1 = ifelse(x, y, NA),           z_1 = ifelse(x, z, NA)) 

I know that this can be done in base R in several ways, but I would specifically like to accomplish this goal using dplyr's mutate_at function for the sake of readability, interfacing with databases, etc.

Below are some similar questions asked on stackoverflow which do not address the question I posed here:

adding multiple columns in a dplyr mutate call

dplyr::mutate to add multiple values

Use of column inside sum() function using dplyr's mutate() function

like image 480
bschneidr Avatar asked Aug 29 '16 15:08

bschneidr


People also ask

Can you group by multiple columns in dplyr?

The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

How do I add multiple columns in R?

How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.

How do I convert multiple columns to numeric in R?

Use the lapply() Function to Convert Multiple Columns From Integer to Numeric Type in R. Base R's lapply() function allows us to apply a function to elements of a list. We will apply the as. numeric() function.


2 Answers

This was answered by @eipi10 in @eipi10's comment on the question, but I'm writing it here for posterity.

The solution here is to use:

df %>%    mutate_at(.vars = vars(y, z),              .funs = list(~ ifelse(x, ., NA))) 

You can also use the new across() function with mutate(), like so:

df %>%    mutate(across(c(y, z), ~ ifelse(x, ., NA))) 

The use of the formula operator (as in ~ ifelse(...)) here indicates that ifelse(x, ., NA) is an anonymous function that is being defined within the call to mutate_at().

This works similarly to defining the function outside of the call to mutate_at(), like so:

temp_fn <- function(input) ifelse(test = df[["x"]],                                   yes = input,                                   no = NA)  df %>%    mutate_at(.vars = vars(y, z),              .funs = temp_fn) 

Note on syntax changes in dplyr: Prior to dplyr version 0.8.0, you would simply write .funs = funs(ifelse(x, . , NA)), but the funs() function is being deprecated and will soon be removed from dplyr.

like image 138
bschneidr Avatar answered Oct 12 '22 17:10

bschneidr


To supplement the previous response, if you wanted mutate_at() to add new variables (instead of replacing), with names such as z_1 and y_1 as in the original question, you just need to:

  • dplyr >=1 with across(): add .names="{.col}_1", or alternatively use list('1'=~ifelse(x, ., NA) (back ticks!)
  • dplyr [0.8, 1[: use list('1'=~ifelse(x, ., NA)
  • dplyr <0.8: use funs('1'=ifelse(x, ., NA)
library(tidyverse)  df <- data.frame(   x = c(TRUE, TRUE, FALSE),   y = c("Hello", "Hola", "Ciao"),   z = c("World", "ao", "HaOlam") )  ## Version >=1 df %>%   mutate(across(c(y, z),                  list(~ifelse(x, ., NA)),                 .names="{.col}_1")) #>       x     y      z   y_1   z_1 #> 1  TRUE Hello  World Hello World #> 2  TRUE  Hola     ao  Hola    ao #> 3 FALSE  Ciao HaOlam  <NA>  <NA>   ## 0.8 - <1 df %>%   mutate_at(.vars = vars(y, z),             .funs = list(`1`=~ifelse(x, ., NA))) #>       x     y      z   y_1   z_1 #> 1  TRUE Hello  World Hello World #> 2  TRUE  Hola     ao  Hola    ao #> 3 FALSE  Ciao HaOlam  <NA>  <NA>  ## Before 0.8 df %>%   mutate_at(.vars = vars(y, z),             .funs = funs(`1`=ifelse(x, ., NA))) #> Warning: `funs()` is deprecated as of dplyr 0.8.0. #> Please use a list of either functions or lambdas:  #>  #>   # Simple named list:  #>   list(mean = mean, median = median) #>  #>   # Auto named with `tibble::lst()`:  #>   tibble::lst(mean, median) #>  #>   # Using lambdas #>   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE)) #> This warning is displayed once every 8 hours. #> Call `lifecycle::last_warnings()` to see where this warning was generated. #>       x     y      z   y_1   z_1 #> 1  TRUE Hello  World Hello World #> 2  TRUE  Hola     ao  Hola    ao #> 3 FALSE  Ciao HaOlam  <NA>  <NA> 

Created on 2020-10-03 by the reprex package (v0.3.0)

For more details and tricks, see: Create new variables with mutate_at while keeping the original ones

like image 23
Matifou Avatar answered Oct 12 '22 18:10

Matifou