Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract string between multiple words, using gsub

Tags:

regex

r

gsub

I am trying to isolate words from a string in R using -gsub-. I want to extract a name that can be found between either "(" and "(m)" (for males) or between "(" and "(f)". I am struggling to incorporate in one line of code.

name<-c("Dr. T. (Tom) Bailey (m), UCL- Physics" , "Dr. B.K. (Barbara) Blue (f), Oxford - Political Science")

malename<-gsub(".*\\) (.*) \\(m).*", "\\1", name)
femname<-gsub(".*\\) (.*) \\(f).*", "\\1", name)

The code above gives me the names for males and females separately, but ideally I want to obtain their lastname in one variable. This would involve some OR function (so (m) OR (f)), but I don't know how to incorporate this.

like image 990
Tom Bailey Avatar asked Mar 11 '23 15:03

Tom Bailey


2 Answers

If you need to match either m or f, the best way to match them is a character class (or, in POSIX terminology, a bracket expression): [mf].

Your regex will look like

".*\\)\\s+(.*)\\s+\\([mf]\\).*"
                     ^^^^

See the regex demo

You may use the regex with sub to make sure only one regex match and replacement are performed (see online demo):

name<-c("Dr. T. (Tom) Bailey (m), UCL- Physics" , "Dr. B.K. (Barbara) Blue (f), Oxford - Political Science")
res <- sub(".*\\)\\s+(.*)\\s+\\([mf]\\).*", "\\1", name)
res
## => [1] "Bailey" "Blue"  
like image 145
Wiktor Stribiżew Avatar answered Mar 13 '23 14:03

Wiktor Stribiżew


Try with sub

sub("^[^)]+\\)\\s+(\\w+).*", "\\1", name)
#[1] "Bailey" "Blue"  
like image 25
akrun Avatar answered Mar 13 '23 14:03

akrun