I am reading in a bunch of CSVs that have stuff like "sales - thousands" in the title and come into R as "sales...thousands". I'd like to use a regular expression (or other simple method) to clean these up.
I can't figure out why this doesn't work:
#mock data
a <- data.frame(this.is.fine = letters[1:5],
this...one...isnt = LETTERS[1:5])
#column names
colnames(a)
# [1] "this.is.fine" "this...one...isnt"
#function to remove multiple spaces
colClean <- function(x){
colnames(x) <- gsub("\\.\\.+", ".", colnames(x))
}
#run function
colClean(a)
#names go unaffected
colnames(a)
# [1] "this.is.fine" "this...one...isnt"
but this code does:
#direct change to names
colnames(a) <- gsub("\\.\\.+", ".", colnames(a))
#new names
colnames(a)
# [1] "this.is.fine" "this.one.isnt"
Note that I'm fine leaving one period between words when that occurs.
Thank you.
The easiest option to replace spaces in column names is with the clean. names() function. This R function creates syntactically correct column names by replacing blanks with an underscore. Moreover, you can use this function in combination with the %>%-operator from the Tidyverse package.
To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.
A basic rule of R is to avoid naming data-frame columns using names that contain spaces. R will accept a name containing spaces, but the spaces then make it impossible to reference the object in a function.
names(a) <- gsub(x = names(a), pattern = "\\.", replacement = "#")
you can use gsub
function to replace .
with another special character like #
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With