Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create new variables with mutate_at while keeping the original ones

Tags:

r

dplyr

Consider this simple example:

library(dplyr)  dataframe <- data_frame(helloo = c(1,2,3,4,5,6),                         ooooHH = c(1,1,1,2,2,2),                         ahaaa = c(200,400,120,300,100,100))  # A tibble: 6 x 3   helloo ooooHH ahaaa    <dbl>  <dbl> <dbl> 1      1      1   200 2      2      1   400 3      3      1   120 4      4      2   300 5      5      2   100 6      6      2   100 

Here I want to apply the function ntile to all the columns that contains oo, but I would like these new columns to be called cat + the corresponding column.

I know I can do this

dataframe %>% mutate_at(vars(contains('oo')), .funs = funs(ntile(., 2))) # A tibble: 6 x 3   helloo ooooHH ahaaa    <int>  <int> <dbl> 1      1      1   200 2      1      1   400 3      1      1   120 4      2      2   300 5      2      2   100 6      2      2   100 

But what I need is this

# A tibble: 8 x 5   helloo   ooooHH   ahaaa cat_helloo cat_ooooHH      <dbl>    <dbl> <dbl>    <int>    <int> 1        1        1   200        1        1 2        2        1   400        1        1 3        3        1   120        1        1 4        4        2   300        2        2 5        5        2   100        2        2 6        5        2   100        2        2 7        6        2   100        2        2 8        6        2   100        2        2 

Is there a solution that does NOT require to store the intermediate data, and merge back to the original dataframe?

like image 519
ℕʘʘḆḽḘ Avatar asked Aug 29 '17 20:08

ℕʘʘḆḽḘ


People also ask

Which function adds new variables and preserves existing ones?

mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones.

Does mutate create a new variable?

mutate() is a dplyr function that adds new variables and preserves existing ones.


1 Answers

Update 2020-06 for dplyr 1.0.0

Starting in dplyr 1.0.0, the across() function supersedes the "scoped variants" of functions such as mutate_at(). The code should look pretty familiar within across(), which is nested inside mutate().

Adding a name to the function(s) you give in the list adds the function name as a suffix.

dataframe %>%      mutate( across(contains('oo'),                      .fns = list(cat = ~ntile(., 2))) )  # A tibble: 6 x 5   helloo ooooHH ahaaa helloo_cat ooooHH_cat    <dbl>  <dbl> <dbl>      <int>      <int> 1      1      1   200          1          1 2      2      1   400          1          1 3      3      1   120          1          1 4      4      2   300          2          2 5      5      2   100          2          2 6      6      2   100          2          2 

Changing the new columns names is a little easier in 1.0.0 with the .names argument in across(). Here is an example of adding the function name as a prefix instead of a suffix. This uses glue syntax.

dataframe %>%      mutate( across(contains('oo'),                      .fns = list(cat = ~ntile(., 2)),                     .names = "{fn}_{col}" ) )  # A tibble: 6 x 5   helloo ooooHH ahaaa cat_helloo cat_ooooHH    <dbl>  <dbl> <dbl>      <int>      <int> 1      1      1   200          1          1 2      2      1   400          1          1 3      3      1   120          1          1 4      4      2   300          2          2 5      5      2   100          2          2 6      6      2   100          2          2 

Original answer with mutate_at()

Edited to reflect changes in dplyr. As of dplyr 0.8.0, funs() is deprecated and list() with ~ should be used instead.

You can give names to the functions to the list you pass to .funs to make new variables with the names as suffixes attached.

dataframe %>% mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2)))  # A tibble: 6 x 5   helloo ooooHH ahaaa helloo_cat ooooHH_cat    <dbl>  <dbl> <dbl>      <int>      <int> 1      1      1   200          1          1 2      2      1   400          1          1 3      3      1   120          1          1 4      4      2   300          2          2 5      5      2   100          2          2 6      6      2   100          2          2 

If you want it as a prefix instead, you could then use rename_at to change the names.

dataframe %>%       mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2))) %>%      rename_at( vars( contains( "_cat") ), list( ~paste("cat", gsub("_cat", "", .), sep = "_") ) )  # A tibble: 6 x 5   helloo ooooHH ahaaa cat_helloo cat_ooooHH    <dbl>  <dbl> <dbl>      <int>      <int> 1      1      1   200          1          1 2      2      1   400          1          1 3      3      1   120          1          1 4      4      2   300          2          2 5      5      2   100          2          2 6      6      2   100          2          2 

Previous code with funs() from earlier versions of dplyr:

dataframe %>%       mutate_at(vars(contains('oo')), .funs = funs(cat = ntile(., 2))) %>%      rename_at( vars( contains( "_cat") ), funs( paste("cat", gsub("_cat", "", .), sep = "_") ) ) 
like image 182
aosmith Avatar answered Oct 07 '22 12:10

aosmith