Extract string between multiple words, using gsub

Question

I am trying to isolate words from a string in R using -gsub-. I want to extract a name that can be found between either "(" and "(m)" (for males) or between "(" and "(f)". I am struggling to incorporate in one line of code.

name<-c("Dr. T. (Tom) Bailey (m), UCL- Physics" , "Dr. B.K. (Barbara) Blue (f), Oxford - Political Science")

malename<-gsub(".*\) (.*) $m).*", "\1", name)
femname<-gsub(".*$ (.*) \(f).*", "\1", name)

The code above gives me the names for males and females separately, but ideally I want to obtain their lastname in one variable. This would involve some OR function (so (m) OR (f)), but I don't know how to incorporate this.

Wiktor Stribiżew · Accepted Answer

If you need to match either m or f, the best way to match them is a character class (or, in POSIX terminology, a bracket expression): [mf].

Your regex will look like

".*\)\s+(.*)\s+$[mf]$.*"
                     ^^^^

See the regex demo

You may use the regex with sub to make sure only one regex match and replacement are performed (see online demo):

name<-c("Dr. T. (Tom) Bailey (m), UCL- Physics" , "Dr. B.K. (Barbara) Blue (f), Oxford - Political Science")
res <- sub(".*\)\s+(.*)\s+$[mf]$.*", "\1", name)
res
## => [1] "Bailey" "Blue"

akrun · Answer

Try with sub

sub("^[^)]+\)\s+(\w+).*", "\1", name)
#[1] "Bailey" "Blue"

Extract string between multiple words, using gsub

Tags:

regex

r

gsub

Tom Bailey

2 Answers

Wiktor Stribiżew

akrun

Recent Activity

Donate For Us

Extract string between multiple words, using gsub

Tags:

regex

r

gsub

Tom Bailey

2 Answers

Wiktor Stribiżew

akrun

Related questions

Recent Activity

Donate For Us