Extracting everything between two symbols in a string

Tags:

I have a vector containing some names. I want to extract the title on every row, basically everything between the ", " (included the white space) and "."

> head(combi$Name)
[1] "Braund, Mr. Owen Harris"
[2] "Cumings, Mrs. John Bradley (Florence Briggs Thayer)"
[3] "Heikkinen, Miss. Laina"
[4] "Futrelle, Mrs. Jacques Heath (Lily May Peel)"
[5] "Allen, Mr. William Henry"
[6] "Moran, Mr. James"

I suppose gsub might come useful but I have difficulties on find the right regular expressions to accomplish my needs.

338

asked Feb 16 '14 15:02

Gianluca

2 Answers

1) sub With sub

> sub(".*, ([^.]*)\\..*", "\\1", Name)
[1] "Mr"   "Mrs"  "Miss" "Mrs"  "Mr"   "Mr"

1a) sub variation This approach with gsub also works:

> sub(".*, |\\..*", "", Name)
[1] "Mr"   "Mrs"  "Miss" "Mrs"  "Mr"   "Mr"

2) strapplyc or using strapplyc in the gusbfn package it can be done with a simpler regular expression:

> library(gsubfn)
>
> strapplyc(Name, ", ([^.]*)\\.", simplify = TRUE)
[1] "Mr"   "Mrs"  "Miss" "Mrs"  "Mr"   "Mr"

2a) strapplyc variation This one seems to have the simplest regular expression of them all.

> library(gsubfn)
>
> sapply(strapplyc(Name, "\\w+"), "[", 2)
[1] "Mr"   "Mrs"  "Miss" "Mrs"  "Mr"   "Mr"

3) strsplit A third way is using strsplit

> sapply(strsplit(Name, ", |\\."), "[", 2)
[1] "Mr"   "Mrs"  "Miss" "Mrs"  "Mr"   "Mr"

Added additional solutions. Changed gsub to sub (although gsub works too).

168

answered Oct 31 '22 01:10

G. Grothendieck

Not to note that there's anything lacking from G. Grothendieck's answer. I just want to add a solution using sub and non-greedy repetition:

vec <- c("Moran, Mr. James",
         "Rothschild, Mrs. Martin (Elizabeth L. Barrett)")

sub(".*, (.+?)\\..*", "\\1", vec)
# [1] "Mr"  "Mrs"

Another alternative with regexpr, regmatches, and lookbehind/lookahead:

regmatches(vec, regexpr("(?<=, ).+?(?=\\.)", vec, perl = TRUE))
# [1] "Mr"  "Mrs"

answered Oct 31 '22 02:10

Sven Hohenstein

Related questions
                            
                                javascript regex for tab followed by a space
                            
                                Declaring an awk function in bash
                            
                                extracting value from a file using ant
                            
                                PHP Get All URL's of Images From String
                            
                                Split string with a single occurence (not twice) of a delimiter in Javascript
                            
                                PHP Filter text for banned words
                            
                                Python replace forward slash with back slash
                            
                                Removing multiple spaces and trailing spaces using gsub
                            
                                How can I define a regexp pattern as a constant in Perl?
                            
                                Python: splitting a complex string including parentheses and |
                            
                                How to match a substring surrounded by known prefix and suffix in javascript [closed]
                            
                                How to use Regular Expression in sql server?
                            
                                javascript: splitting a string (yet preserving the spaces) [closed]
                            
                                Perl regex - How to make it less greedy?
                            
                                split string at first number
                            
                                How to validate that a string is a proper hexadecimal value in Ruby?
                            
                                Replace decimals 1 to 10 with name ("one", "two"..)
                            
                                Why are `\Q` `\E` in a Perl pattern in some cases interpreted as literal `Q` `E`?
                            
                                Check string for valid CSS using Regex
                            
                                How does the end of line influence the regex here?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extracting everything between two symbols in a string

Tags:

regex

r

gsub

Gianluca

People also ask

2 Answers

G. Grothendieck

Sven Hohenstein

Recent Activity

Donate For Us