Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting string between capital and lowercase character in R?

Tags:

string

r

I have a vector of character strings:

v1 <- c("Firstname LastnameFirstname Lastname", 
"Firstname Lastname", 
"Firstname Lastname", 
"Firstname LastnameFirstname Lastname")

I'd like to split the string between lowercase letter followed by a capital letter retaining both of the letters.

The desired output would be:

[1] "Firstname Lastname" "Firstname Lastname"   "Firstname Lastname"  "Firstname Lastname"  "Firstname Lastname" "Firstname Lastname"

Following examples in StackExchange I've tried with the strsplit function with gsub:

unlist(strsplit( gsub("([a-z][A-Z])","\\1~",v1), "~" ))

but this does not split between the characters, rather after the regex match for split point:

[1] "Firstname LastnameF" "irstname Lastname"   "Firstname Lastname"  "Firstname Lastname"  "Firstname LastnameF" "irstname Lastname"  

How do I split between the characters still retaining both of the characters?

like image 200
JuusoT Avatar asked Dec 23 '22 18:12

JuusoT


1 Answers

We can use regex lookaround to match lower case letters (positive lookbehind - (?<=[a-z])) followed by upper case letters (positive lookahead -(?=[A-Z]))

unlist(strsplit(v1, "(?<=[a-z])(?=[A-Z])", perl = TRUE))
#[1] "Firstname Lastname" "Firstname Lastname" "Firstname Lastname" 
#[4] "Firstname Lastname" "Firstname Lastname" "Firstname Lastname"
like image 199
akrun Avatar answered Jan 31 '23 11:01

akrun