Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R dplyr: rename variables using string functions

(Somewhat related question: Enter new column names as string in dplyr's rename function)

In the middle of a dplyr chain (%>%), I would like to replace multiple column names with functions of their old names (using tolower or gsub, etc.)

library(tidyr); library(dplyr)
data(iris)
# This is what I want to do, but I'd like to use dplyr syntax
names(iris) <- tolower( gsub("\\.", "_", names(iris) ) )
glimpse(iris, 60)
# Observations: 150
# Variables:
#   $ sepal_length (dbl) 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6,...
#   $ sepal_width  (dbl) 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4,...
#   $ petal_length (dbl) 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4,...
#   $ petal_width  (dbl) 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3,...
#   $ species      (fctr) setosa, setosa, setosa, setosa, s...

# the rest of the chain:
iris %>% gather(measurement, value, -species) %>%
  group_by(species,measurement) %>%
  summarise(avg_value = mean(value)) 

I see ?rename takes the argument replace as a named character vector, with new names as values, and old names as names.

So I tried:

iris %>% rename(replace=c(names(iris)=tolower( gsub("\\.", "_", names(iris) ) )  ))

but this (a) returns Error: unexpected '=' in iris %>% ... and (b) requires referencing by name the data frame from the previous operation in the chain, which in my real use case I couldn't do.

iris %>% 
  rename(replace=c(    )) %>% # ideally the fix would go here
  gather(measurement, value, -species) %>%
  group_by(species,measurement) %>%
  summarise(avg_value = mean(value)) # I realize I could mutate down here 
                                     #  instead, once the column names turn into values, 
                                     #  but that's not the point
# ---- Desired output looks like: -------
# Source: local data frame [12 x 3]
# Groups: species
# 
#       species  measurement avg_value
# 1      setosa sepal_length     5.006
# 2      setosa  sepal_width     3.428
# 3      setosa petal_length     1.462
# 4      setosa  petal_width     0.246
# 5  versicolor sepal_length     5.936
# 6  versicolor  sepal_width     2.770
# ... etc ....  
like image 499
C8H10N4O2 Avatar asked May 21 '15 19:05

C8H10N4O2


People also ask

How do I change a variable name in dplyr?

To rename a column in R you can use the rename() function from dplyr. For example, if you want to rename the column “A” to “B”, again, you can run the following code: rename(dataframe, B = A) .

How do I rename a data variable in R?

I'll just say it once more: if you need to rename variables in R, just use the rename() function.

How do I rename multiple columns in dplyr?

To change multiple column names by name and by index use rename() function of the dplyr package and to rename by just name use setnames() from data. table . From R base functionality, we have colnames() and names() functions that can be used to rename a data frame column by a single index or name.

Is there a rename function in R?

rename() function in R Language is used to rename the column names of a data frame, based on the older names.


4 Answers

This is a very late answer, on May 2017

As of dplyr 0.5.0.9004, soon to be 0.6.0, many new ways of renaming columns, compliant with the maggritr pipe operator %>%, have been added to the package.

Those functions are:

  • rename_all
  • rename_if
  • rename_at

There are many different ways of using those functions, but the one relevant to your problem, using the stringr package is the following:

df <- df %>%   rename_all(       funs(         stringr::str_to_lower(.) %>%         stringr::str_replace_all(., '\\.', '_')       )   ) 

And so, carry on with the plumbing :) (no pun intended).

like image 129
Guilherme Marthe Avatar answered Sep 20 '22 05:09

Guilherme Marthe


I think you're looking at the documentation for plyr::rename, not dplyr::rename. You would do something like this with dplyr::rename:

iris %>% rename_(.dots=setNames(names(.), tolower(gsub("\\.", "_", names(.))))) 
like image 21
Matthew Plourde Avatar answered Sep 19 '22 05:09

Matthew Plourde


Here's a way around the somewhat awkward rename syntax:

myris <- iris %>% setNames(tolower(gsub("\\.","_",names(.))))
like image 40
Frank Avatar answered Sep 21 '22 05:09

Frank


As of 2020, rename_if, rename_at and rename_all are marked superseded. The up-to-date way to tackle this the dplyr way would be rename_with():

iris %>% rename_with(tolower)

or a more complex version:

iris %>% 
  rename_with(stringr::str_replace, 
              pattern = "Length", replacement = "len", 
              matches("Length"))

(edit 2021-09-08)
As mentioned in a comment by @a_leemo, this notation is not mentioned in the manual verbatim. Rather, one would deduce the following from the manual:

iris %>% 
  rename_with(~ stringr::str_replace(.x, 
                                     pattern = "Length", 
                                     replacement = "len"), 
              matches("Length")) 

Both do the same thing, yet, I find the first solution a bit more readable. In the first example pattern = ... and replacement = ... are forwarded to the function as part of the ... dots implementation. For more details see ?rename_with and ?dots.

like image 30
loki Avatar answered Sep 22 '22 05:09

loki