Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Capitalize with dplyr

I am doing data cleaning with dplyr. One of the things I want to do is to capitalize values in certain columns.

    data$surname
    john
    Mary
    John
    mary
    ...

I suppose I have to use the mutate function of dplyr with something like this

    titleCase <- function(x) {
    + s <- strsplit(as.character(x), " ")[[1]]
    + paste(toupper(substring(s, 1, 1)), substring(s, 2),
    + sep = "", collapse = " ")
    + }

But how to combine both? I get all kinds of errors or truncated data frames

Thanks

like image 329
Philippe Ramirez Avatar asked Nov 29 '22 23:11

Philippe Ramirez


1 Answers

We can use sub

sub("(.)", "\\U\\1", data$surname, perl=TRUE)
#[1] "John" "Mary" "John" "Mary"

Implementing in the dplyr workflow

library(dplyr)
data %>%
     mutate(surname = sub("(.)", "\\U\\1", surname, perl=TRUE))

If we need to do this on multiple columns

data %>%
     mutate_each(funs(sub("(.)", "\\U\\1", ., perl=TRUE)))

Just to check

res <- data1 %>%  
          mutate(surname = sub("(.)", "\\U\\1", surname, perl=TRUE))
sum(grepl("[A-Z]", substr(res$surname, 1,1)))
#[1] 500000

data

data <- data.frame(surname=c("john", "Mary", "John", "mary"), 
firstname = c("abe", "Jacob", "george", "jen"), stringsAsFactors=FALSE)

data1 <-  data.frame(surname = sample(c("john", "Mary", "John", "mary"), 
    500000, replace=TRUE), stringsAsFactors=FALSE)
like image 89
akrun Avatar answered Dec 26 '22 00:12

akrun