Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(In R) How to split words by title case in a string like "WeLiveInCA" into "We Live In CA" while preserving abbreviations?

(In R) How to split words by title case in a string like "WeLiveInCA" into "We Live In CA" without splitting abbreviations?

I know how to split the string at every uppercase letter, but doing that would split initialisms/abbreviations, like CA or USSR or even U.S.A. and I need to preserve those.

So I'm thinking some type of logical like if a word in a string isn't an initialism then split the word with a space where a lowercase character is followed by an uppercase character.

My snippet of code below splits words with spaces by capital letters, but it breaks initialisms like CA becomes C A undesirably.

s <- "WeLiveInCA"
trimws(gsub('([[:upper:]])', ' \\1', s))
# "We Live In C A"

or another example...

s <- c("IDon'tEatKittensFYI", "YouKnowYourABCs")
trimws(gsub('([[:upper:]])', ' \\1', s))
# "I Don't Eat Kittens F Y I" "You Know Your A B Cs"

The results I'd want would be:

"We Live In CA"
#
"I Don't Eat Kittens FYI" "You Know Your ABCs"

But this needs to be widely applicable (not just for my example)

like image 745
Samantha Karlaina Rhoads Avatar asked Dec 29 '25 18:12

Samantha Karlaina Rhoads


1 Answers

Try with base R gregexpr/regmatches.

s <- c("WeLiveInCA", "IDon'tEatKittensFYI", "YouKnowYourABCs")
regmatches(s, gregexpr('[[:upper:]]+[^[:upper:]]*', s))
#[[1]]
#[1] "We"   "Live" "In"   "CA"  
#
#[[2]]
#[1] "IDon't"  "Eat"     "Kittens" "FYI"    
#
#[[3]]
#[1] "You"  "Know" "Your" "ABCs"

Explanation.

  1. [[:upper:]]+ matches one or more upper case letters;
  2. [^[:upper:]]* matches zero or more occurrences of anything but upper case letters.
  3. In sequence these two regular expressions match words starting with upper case letter(s) followed by something else.
like image 145
Rui Barradas Avatar answered Jan 01 '26 08:01

Rui Barradas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!