Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Capitalizing everything after a certain character

I would like to capitalize everything in a character vector that comes after the first _. For example the following vector:

x <- c("NYC_23df", "BOS_3_rb", "mgh_3_3_f") 

Should come out like this:

"NYC_23DF" "BOS_3_RB" "mgh_3_3_F"

I have been trying to play with regular expressions, but am not able to do this. Any suggestions would be appreciated.

like image 737
Mikko Avatar asked May 29 '12 08:05

Mikko


People also ask

How do you capitalize everything in R?

Convert string from lowercase to uppercase in R programming – toupper() function. toupper() method in R programming is used to convert the lowercase string to uppercase string. Return: Returns the uppercase string.

How do I change all caps to lowercase except the first letter in R?

str_to_title() Function in R Language is used to convert the first letter of every word of a string to Uppercase and the rest of the letters are converted to lower case. Note: This function uses 'stringr' library.

Do capitals matter in R?

There should be no difference.

Which of the following function returns the string with first letter of every word in the string in uppercase and rest in lowercase?

Python String capitalize() method returns a copy of the original string and converts the first character of the string to a capital (uppercase) letter, while making all other characters in the string lowercase letters.


2 Answers

You were very close:

gsub("(_.*)","\\U\\1",x,perl=TRUE)

seems to work. You just needed to use _.* (underscore followed by zero or more other characters) rather than _* (zero or more underscores) ...

To take this apart a bit more:

  • _.* gives a regular expression pattern that matches an underscore _ followed by any number (including 0) of additional characters; . denotes "any character" and * denotes "zero or more repeats of the previous element"
  • surrounding this regular expression with parentheses () denotes that it is a pattern we want to store
  • \\1 in the replacement string says "insert the contents of the first matched pattern", i.e. whatever matched _.*
  • \\U, in conjunction with perl=TRUE, says "put what follows in upper case" (uppercasing _ has no effect; if we wanted to capitalize everything after (for example) a lower-case g, we would need to exclude the g from the stored pattern and include it in the replacement pattern: gsub("g(.*)","g\\U\\1",x,perl=TRUE))

For more details, search for "replacement" and "capitalizing" in ?gsub (and ?regexp for general information about regular expressions)

like image 164
Ben Bolker Avatar answered Sep 22 '22 20:09

Ben Bolker


gsubfn in the gsubfn package is like gsub except the replacement string can be a function. Here we match _ and everything afterwards feeding the match through toupper :

> library(gsubfn)
>
> gsubfn("_.*", toupper, x)
[1] "NYC_23DF"  "BOS_3_RB"  "mgh_3_3_F"

Note that this approach involves a particularly simple regular expression.

like image 42
G. Grothendieck Avatar answered Sep 22 '22 20:09

G. Grothendieck