Consider this simple example:
library(dplyr) dataframe <- data_frame(helloo = c(1,2,3,4,5,6), ooooHH = c(1,1,1,2,2,2), ahaaa = c(200,400,120,300,100,100)) # A tibble: 6 x 3 helloo ooooHH ahaaa <dbl> <dbl> <dbl> 1 1 1 200 2 2 1 400 3 3 1 120 4 4 2 300 5 5 2 100 6 6 2 100
Here I want to apply the function ntile
to all the columns that contains oo
, but I would like these new columns to be called cat
+ the corresponding column.
I know I can do this
dataframe %>% mutate_at(vars(contains('oo')), .funs = funs(ntile(., 2))) # A tibble: 6 x 3 helloo ooooHH ahaaa <int> <int> <dbl> 1 1 1 200 2 1 1 400 3 1 1 120 4 2 2 300 5 2 2 100 6 2 2 100
But what I need is this
# A tibble: 8 x 5 helloo ooooHH ahaaa cat_helloo cat_ooooHH <dbl> <dbl> <dbl> <int> <int> 1 1 1 200 1 1 2 2 1 400 1 1 3 3 1 120 1 1 4 4 2 300 2 2 5 5 2 100 2 2 6 5 2 100 2 2 7 6 2 100 2 2 8 6 2 100 2 2
Is there a solution that does NOT require to store the intermediate data, and merge back to the original dataframe?
mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones.
mutate() is a dplyr function that adds new variables and preserves existing ones.
Update 2020-06 for dplyr 1.0.0
Starting in dplyr 1.0.0, the across()
function supersedes the "scoped variants" of functions such as mutate_at()
. The code should look pretty familiar within across()
, which is nested inside mutate()
.
Adding a name to the function(s) you give in the list adds the function name as a suffix.
dataframe %>% mutate( across(contains('oo'), .fns = list(cat = ~ntile(., 2))) ) # A tibble: 6 x 5 helloo ooooHH ahaaa helloo_cat ooooHH_cat <dbl> <dbl> <dbl> <int> <int> 1 1 1 200 1 1 2 2 1 400 1 1 3 3 1 120 1 1 4 4 2 300 2 2 5 5 2 100 2 2 6 6 2 100 2 2
Changing the new columns names is a little easier in 1.0.0 with the .names
argument in across()
. Here is an example of adding the function name as a prefix instead of a suffix. This uses glue syntax.
dataframe %>% mutate( across(contains('oo'), .fns = list(cat = ~ntile(., 2)), .names = "{fn}_{col}" ) ) # A tibble: 6 x 5 helloo ooooHH ahaaa cat_helloo cat_ooooHH <dbl> <dbl> <dbl> <int> <int> 1 1 1 200 1 1 2 2 1 400 1 1 3 3 1 120 1 1 4 4 2 300 2 2 5 5 2 100 2 2 6 6 2 100 2 2
Original answer with mutate_at()
Edited to reflect changes in dplyr. As of dplyr 0.8.0, funs()
is deprecated and list()
with ~
should be used instead.
You can give names to the functions to the list you pass to .funs
to make new variables with the names as suffixes attached.
dataframe %>% mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2))) # A tibble: 6 x 5 helloo ooooHH ahaaa helloo_cat ooooHH_cat <dbl> <dbl> <dbl> <int> <int> 1 1 1 200 1 1 2 2 1 400 1 1 3 3 1 120 1 1 4 4 2 300 2 2 5 5 2 100 2 2 6 6 2 100 2 2
If you want it as a prefix instead, you could then use rename_at
to change the names.
dataframe %>% mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2))) %>% rename_at( vars( contains( "_cat") ), list( ~paste("cat", gsub("_cat", "", .), sep = "_") ) ) # A tibble: 6 x 5 helloo ooooHH ahaaa cat_helloo cat_ooooHH <dbl> <dbl> <dbl> <int> <int> 1 1 1 200 1 1 2 2 1 400 1 1 3 3 1 120 1 1 4 4 2 300 2 2 5 5 2 100 2 2 6 6 2 100 2 2
Previous code with funs()
from earlier versions of dplyr:
dataframe %>% mutate_at(vars(contains('oo')), .funs = funs(cat = ntile(., 2))) %>% rename_at( vars( contains( "_cat") ), funs( paste("cat", gsub("_cat", "", .), sep = "_") ) )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With