Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elegant R function: mixed case separated by periods to underscore separated lower case and/or camel case

Tags:

regex

r

I often get datasets from collaborators that have non-consistent naming of variables/columns in the dataset. One of my first tasks is to rename them, and I want a solution completely within R to do so.

as.Given <- c("ICUDays","SexCode","MAX_of_MLD","Age.Group")

underscore_lowercase <- c("icu_days", "sex_code", "max_of_mld","age_group")

camelCase <- c("icuDays", "sexCode", "maxOfMld", "ageGroup")

Given the different opinions about naming conventions and in the spirit of what was proposed in Python, what ways are there to go from as.Given to underscore_lowercase and/or camelCase in a user-specified way in R?

Edit: Also found this related post in R / regex, especially the answer of @rengis.

like image 699
swihart Avatar asked Aug 26 '14 11:08

swihart


1 Answers

Try this. These at least work on the examples given:

toUnderscore <- function(x) {
  x2 <- gsub("([A-Za-z])([A-Z])([a-z])", "\\1_\\2\\3", x)
  x3 <- gsub(".", "_", x2, fixed = TRUE)
  x4 <- gsub("([a-z])([A-Z])", "\\1_\\2", x3)
  x5 <- tolower(x4)
  x5
}

underscore2camel <- function(x) {
  gsub("_(.)", "\\U\\1", x, perl = TRUE)
}

#######################################################
# test
#######################################################

u <- toUnderscore(as.Given)
u
## [1] "icu_days"   "sex_code"   "max_of_mld" "age_group" 

underscore2camel(u)
## [1] "icuDays"  "sexCode"  "maxOfMld" "ageGroup"
like image 144
G. Grothendieck Avatar answered Oct 13 '22 10:10

G. Grothendieck