I am doing data cleaning with dplyr. One of the things I want to do is to capitalize values in certain columns.
data$surname
john
Mary
John
mary
...
I suppose I have to use the mutate function of dplyr with something like this
titleCase <- function(x) {
+ s <- strsplit(as.character(x), " ")[[1]]
+ paste(toupper(substring(s, 1, 1)), substring(s, 2),
+ sep = "", collapse = " ")
+ }
But how to combine both? I get all kinds of errors or truncated data frames
Thanks
We can use sub
sub("(.)", "\\U\\1", data$surname, perl=TRUE)
#[1] "John" "Mary" "John" "Mary"
Implementing in the dplyr
workflow
library(dplyr)
data %>%
mutate(surname = sub("(.)", "\\U\\1", surname, perl=TRUE))
If we need to do this on multiple columns
data %>%
mutate_each(funs(sub("(.)", "\\U\\1", ., perl=TRUE)))
Just to check
res <- data1 %>%
mutate(surname = sub("(.)", "\\U\\1", surname, perl=TRUE))
sum(grepl("[A-Z]", substr(res$surname, 1,1)))
#[1] 500000
data <- data.frame(surname=c("john", "Mary", "John", "mary"),
firstname = c("abe", "Jacob", "george", "jen"), stringsAsFactors=FALSE)
data1 <- data.frame(surname = sample(c("john", "Mary", "John", "mary"),
500000, replace=TRUE), stringsAsFactors=FALSE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With