I'm working with an imported data set that corresponds to the extract below:
set.seed(1)
dta <- data.frame("This is Column One" = runif(n = 10),
"Another amazing Column name" = runif(n = 10),
"!## This Columns is so special€€€" = runif(n = 10),
check.names = FALSE)
I'm doing some cleaning on this data using dplyr
and I would like to change column names to syntatically correct ones and remove the punctuation as a second step. What I tried so far:
dta_cln <- dta %>%
rename(make.names(names(dta)))
generates an error:
> dta_clean <- dta %>% + rename(make.names(names(dta))) Error: All arguments to rename must be named.
What I wan to achieve can be done in base:
names(dta) <- gsub("[[:punct:]]","",make.names(names(dta)))
which would return:
> names(dta) [1] "ThisisColumnOne" "AnotheramazingColumnname" "XThisColumnsissospecial"
I want to achieve the same effect but using dyplr
and %>%
.
To change multiple column names by name and by index use rename() function of the dplyr package and to rename by just name use setnames() from data. table . From R base functionality, we have colnames() and names() functions that can be used to rename a data frame column by a single index or name.
rename() function from dplyr takes a syntax rename(new_column_name = old_column_name) to change the column from old to a new name. The following example renames the column from id to c1 . The operator – %>% is used to load the renamed column names to the data frame.
The easiest way to rename columns in R is by using the setnames() function from the “data. table” package. This function modifies the column names given a set of old names and a set of new names. Alternatively, you can also use the colnames() function or the “dplyr” package.
Rename Column using colnames() colnames() is the method available in R base which is used to rename columns/variables present in the data frame. By using this you can rename a column by index and name. Alternatively, you can also use name() method.
I know this is an old question, and I'm sure you found the solution by now, but I stumbled here searching for the same question, and ultimately found a few new ways to do this.
Using dplyr 0.6.0
and above, there is now a rename_all
function:
dta %>%
rename_all(funs(gsub("[[:punct:]]", "", make.names(names(dta)))))
Which works, but it's a little messy to me. If you want more flexibility with dplyr
, you can also call on:
rename_at
rename_if
This is a pretty nice package (with plenty of additional utility) that can easily clean up column names:
library(janitor)
dta %>%
clean_names()
Which will rename and clean all column names to the following:
[1] "this_is_column_one" "another_amazing_column_name" "x_this_columns_is_so_special"
Everything becomes snake_case rather than CamelCase, but overall clean_names
is very flexible in the column names it handles. If that IS a deal breaker, you can use yet another package snakecase
for its function to_big_camel_case()
within the rename_all
function...although that is starting to get a little too esoteric
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With