Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decapitalize human names (accounting for ' and -)

Tags:

capitalize

r

I've got a vector of (human) names, all in capitals:

names <- c("FRIEDRICH SCHILLER", "FRANK O'HARA", "HANS-CHRISTIAN ANDERSEN")

To decapitalize (capitalize the first letters only) so far, I was using

simpleDecap <- function(x) {
  s <- strsplit(x, " ")[[1]] 
  paste0(substring(s, 1,1), tolower(substring(s, 2)), collapse=" ")
  }
sapply(names, simpleDecap, USE.NAMES=FALSE)
# [1] "Friedrich Schiller"         "Frank O'hara"         "Hans-christian Andersen"

But I also want to account for for ' and -. Using s <- strsplit(x, " |\\'|\\-")[[1]] of course finds the right letters, but then in the collapse ' and - get lost. Hence, I tried

simpleDecap2 <- function(x) {
  for (char in c(" ", "\\-", "\\'")){
    s <- strsplit(x, char)[[1]] 
    x <-paste0(substring(s, 1,1), tolower(substring(s, 2)), collapse=char)
  } return x
  }

sapply(names, simpleDecap, USE.NAMES=FALSE)

but that's even worse, of course, as the results are split one after the other:

sapply(names, simpleDecap2, USE.NAMES=FALSE)
# [1] "Friedrich schiller"      "Frank o'Hara"            "Hans-christian andersen"

I think the right approach splits according s <- strsplit(x, " |\\'|\\-")[[1]], but the paste= is the problem.

like image 575
MERose Avatar asked Sep 24 '15 11:09

MERose


People also ask

Do you capitalize last names?

Always capitalize the names of people and animals. Capitalize all parts of a name. Some surnames (last names) have unusual capitalization because of their original meaning. Names with more than one part can vary as families change spelling to make it simpler.

Do you capitalize the DE in a last name?

In modern times, Italian da, de, del, della, di, and d' are usually capitalized and used with the last name alone.

Why do Europeans capitalize their last name?

There are cultures where the surname occurs before the given name . Wikipedia refers to them as Western name order vs Eastern name order. If you have a list of names in both Western name order and Eastern name order, all caps are used to disambiguate the surname.


1 Answers

This seems to work, using Perl compatible regular expressions:

gsub("\\b(\\w)([\\w]+)", "\\1\\L\\2", names, perl = TRUE)

\L transforms the following match group to lower case.

like image 52
Konrad Rudolph Avatar answered Nov 15 '22 00:11

Konrad Rudolph