I have a large number of data sets each containing a long list of column names. In some files the column names are all capital letters and in some files only the first letter of the column names is capitalized. I need to append the data sets and thought the easiest way to match column names among data sets would be to convert the all-capital names into names with only the first letter capitalized.
I am hoping to find a general solution, maybe even a one-liner.
Here is my example data set. The desired names are included in the names
statements.
my.data2 <- "
landuse units grade CLAY LINCOLN BASINANDRANGE MCCARTNEY MAPLE
apple acres AAA 0 2 3 4 6
apple acres AA 1000 900 NA NA 700
pear acres AA 10.0 20 NA 30.0 40
peach acres AAA 500 400 350 300 200
"
my.data2 <- read.table(textConnection(my.data2), header=TRUE)
names(my.data2)[names(my.data2)=="CLAY"] <- "Clay"
names(my.data2)[names(my.data2)=="BASINANDRANGE"] <- "BasinandRange"
names(my.data2)[names(my.data2)=="LINCOLN"] <- "Lincoln"
names(my.data2)[names(my.data2)=="MCCARTNEY"] <- "McCartney"
names(my.data2)[names(my.data2)=="MAPLE"] <- "Maple"
my.data2
Note that I included the names McCartney
and BasinandRange
to make things more realistic and more difficult. However, if I can find a one-liner to deal with 95% of the names and use the above names
statements to deal with complications like McCartney
and BasinandRange
that would be great.
I have searched the internet, including the StackOverflow archives, without finding a solution. Sorry if I overlooked one. Thank you for any help.
Convert Column Names to Uppercase using str. where, df is the input dataframe and columns is the attribute to get the column labels as an Index Object. Then using the StringMethods. upper() we converted all labels to uppercase. It converted all the column labels to uppercase.
tolower() method in R programming is used to convert the uppercase letters of string to lowercase string. Return: Returns the lowercase string.
What code chunk lets the analyst change all the column names to lowercase? The rename_with() function will enable the analyst to easily change the case of the column names to lowercase. Including the tolower argument indicates that all column names will be changed to lowercase.
Here is a one-liner implementing "the easiest way to match column names among data sets" that I can think of:
## Columns 1:3 left unaltered since they are not place names.
names(my.data2)[-1:-3] <- tolower(names(my.data2)[-1:-3])
## View the results
names(my.data2)
# [1] "landuse" "units" "grade" "clay"
# [5] "lincoln" "basinandrange" "mccartney" "maple"
Easy Solution
names(DF) <- toupper(names(DF))
This is now a job for janitor::clean_names()
, just choose case
parameter that fits you need.
data.table syntax, I believe would save more time and efficient. its also a one line statement, even shorter.
library(data.table)
setnames(my.data2, tolower(names(my.data2[4:8])))
# landuse units grade clay lincoln basinandrange mccartney maple
#1: apple acres AAA 0 2 3 4 6
#2: apple acres AA 1000 900 NA NA 700
#3: pear acres AA 10 20 NA 30 40
#4: peach acres AAA 500 400 350 300 200
Combining two of the answers here, I've come up with an elegant tidy
way:
This renames all column/variable names by capitalising the first letter of every word.
library(tidyverse)
my.data2 %>%
rename_with(str_to_title)
A "tidy" solution:
library(dplyr)
my.data2.mod <- my.data2 %>%
rename_at(c("CLAY", "LINCOLN", "BASINANDRANGE", "MCCARTNEY", "MAPLE"),
.funs = tolower)
names(my.data2.mod)
# [1] "landuse" "units" "grade" "clay"
# [5] "lincoln" "basinandrange" "mccartney" "maple"
Also, to answer the original question and leave some cases capitalized, you can use the snakecase
package:
library(snakecase)
my.data2.mod = my.data2 %>%
rename_at(
c("CLAY", "LINCOLN", "BASINANDRANGE", "MCCARTNEY", "MAPLE"),
.funs = list(
~ to_upper_camel_case(.,
abbreviations = c("McCartney", "BasinandRange")
)
)
)
names(my.data2.mod)
# [1] "landuse" "units" "grade" "Clay"
# [5] "Lincoln" "BasinandRange" "McCartney" "Maple"
Another option:
colnames(df) <- stringr::str_to_title(colnames(df))
I used Josh O'Brien's answer, but eventually wrote the code below that creates column names with the first letter
in upper case and the other letters in lower case, with a few exceptions handled as in the original post. Below I used the same data set as in the original post, but read that data into R differently where n.col
determines the number of columns in the data file:
n.col <- as.numeric(length(scan("c:/users/mark w miller/simple R programs/names_with_capital_letters.txt",
what="character", nlines=1)))
my.data2 <- read.table(file = "c:/users/mark w miller/simple R programs/names_with_capital_letters.txt",
na.string=NA, header = T, colClasses = c('character', 'character', 'character',
rep('numeric', (n.col[1] - 3))))
first.letter <- substring(names(my.data2)[-1:-3], 1, 1)
other.letters <- tolower(substring(names(my.data2)[-1:-3], 2))
newnames <- paste(first.letter, other.letters, sep="")
names(my.data2)[-1:-3] <- newnames
names(my.data2)[names(my.data2)=="Basinandrange"] <- "BasinandRange"
names(my.data2)[names(my.data2)=="Mccartney"] <- "McCartney"
my.data2
# landuse units grade Clay Lincoln BasinandRange McCartney Maple
# 1 apple acres AAA 0 2 3 4 6
# 2 apple acres AA 1000 900 NA NA 700
# 3 pear acres AA 10 20 NA 30 40
# 4 peach acres AAA 500 400 350 300 200
This will make every column upper case.
rename_with(names,toupper)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With