Novice on regular expressions here ... Assume the following names: <pre class="prettyprint"><code>names <- c("Jackson, Michael", "Lennon, John", "Obama, Barack") </code></pre> I want to split the names, as to retain all the characters up to and including the first letter of the first name. Thus, the results would look this: <pre class="prettyprint"><code>Jackson, M Lennon, J Obama, B </code></pre> I know this is a simple solution, but I am stuck on specifying what seems to be a reasonable solution -- that is, a positive lookahead regex. I am specifying a match based on the comma, the space, and the first letter in caps. This is what I have but obviously it is wrong: <pre class="prettyprint"><code>names.reduced <- gsub("(?=\\,\\s[A-Z]).*", "", names) </code></pre>

<code>(?= ... )</code> is a zero-width assertion which does not consume any characters on the string. It only matches a position in the string. The point of zero-width is the validation to see if a regular expression can or cannot be matched looking ahead from the current position, without adding to the overall match. In this case, using a lookahead assertion is not necessary at all. You can do this using a capture group, backreferencing the group inside the replacement call. <pre class="prettyprint"><code>sub('(.*[A-Z]).*', '\\1', names) # [1] "Jackson, M" "Lennon, J" "Obama, B" </code></pre> Or better yet, you can use negation to remove all except <code>A</code> to <code>Z</code> at the end of the string. <pre class="prettyprint"><code>sub('[^A-Z]*$', '', names) # [1] "Jackson, M" "Lennon, J" "Obama, B" </code></pre>

You can use a lookbehind instead of the lookahead assertion <pre class="prettyprint"><code>sub('(?<=, [A-Z]).*$', '', names, perl=TRUE) #[1] "Jackson, M" "Lennon, J" "Obama, B" </code></pre>

You could use <code>regmatches</code> function also. <pre class="prettyprint"><code>> names <- c("Jackson, Michael", "Lennon, John", "Obama, Barack") > regmatches(names, regexpr(".*,\\s*[A-Z]", names)) [1] "Jackson, M" "Lennon, J" "Obama, B" </code></pre> OR <pre class="prettyprint"><code>> library(stringi) > stri_extract(names, regex=".*,\\s*[A-Z]") [1] "Jackson, M" "Lennon, J" "Obama, B" </code></pre> OR Just match all the chars upto the last uppercase letter. <pre class="prettyprint"><code>> stri_extract(names, regex=".*[A-Z]") [1] "Jackson, M" "Lennon, J" "Obama, B" </code></pre>

Positive lookahead in R

Tags:

regex

r

Novice on regular expressions here ...

Assume the following names:

Click to copy

names <- c("Jackson, Michael", "Lennon, John", "Obama, Barack")

I want to split the names, as to retain all the characters up to and including the first letter of the first name. Thus, the results would look this:

Click to copy

Jackson, M
Lennon, J
Obama, B

I know this is a simple solution, but I am stuck on specifying what seems to be a reasonable solution -- that is, a positive lookahead regex. I am specifying a match based on the comma, the space, and the first letter in caps. This is what I have but obviously it is wrong:

Click to copy

names.reduced <- gsub("(?=\\,\\s[A-Z]).*", "", names)

812

asked Apr 11 '15 02:04

Brian P

3 Answers

(?= ... ) is a zero-width assertion which does not consume any characters on the string.

It only matches a position in the string. The point of zero-width is the validation to see if a regular expression can or cannot be matched looking ahead from the current position, without adding to the overall match. In this case, using a lookahead assertion is not necessary at all.

You can do this using a capture group, backreferencing the group inside the replacement call.

Click to copy

sub('(.*[A-Z]).*', '\\1', names)
# [1] "Jackson, M" "Lennon, J"  "Obama, B"

Or better yet, you can use negation to remove all except A to Z at the end of the string.

Click to copy

sub('[^A-Z]*$', '', names)
# [1] "Jackson, M" "Lennon, J"  "Obama, B"

184

answered Sep 24 '22 15:09

hwnd

You can use a lookbehind instead of the lookahead assertion

Click to copy

sub('(?<=, [A-Z]).*$', '', names, perl=TRUE)
#[1] "Jackson, M" "Lennon, J"  "Obama, B"

answered Sep 22 '22 15:09

akrun

You could use regmatches function also.

Click to copy

> names <- c("Jackson, Michael", "Lennon, John", "Obama, Barack")
> regmatches(names, regexpr(".*,\\s*[A-Z]", names))
[1] "Jackson, M" "Lennon, J"  "Obama, B"

Click to copy

> library(stringi)
> stri_extract(names, regex=".*,\\s*[A-Z]")
[1] "Jackson, M" "Lennon, J"  "Obama, B"

Just match all the chars upto the last uppercase letter.

Click to copy

> stri_extract(names, regex=".*[A-Z]")
[1] "Jackson, M" "Lennon, J"  "Obama, B"

answered Sep 25 '22 15:09

Avinash Raj

Related questions
                            
                                R regex / gsub : extract part of pattern
                            
                                Regular Expression for Password Strength Validation
                            
                                What does the regex '/^$/d' mean?
                            
                                Automatically paraphrasing sentences in JavaScript
                            
                                RegEx for extracting a value from Open3.popen3 stdout
                            
                                Search and replace formatted properties inside Java string
                            
                                Split a string, but ignoring delimit in brackets or braces
                            
                                Check if string containts slash or backslash in Bash?
                            
                                Why regex gets value twice using 'match' in Javascript?
                            
                                Count Pattern Matching in R
                            
                                Right-justification of a column using align-regexp
                            
                                Regular expression in ios to extract href url and discard rest of anchor tag?
                            
                                count pattern occurrence per line
                            
                                Javascript regular expression match string to start with letter or number
                            
                                Perl assign regex match groups to variables
                            
                                sed replacement command not working on Mac
                            
                                How to match spaces that are NOT in a multiple of 4?
                            
                                How to Recursively Remove Files of a Certain Type
                            
                                Why won't a longer token in an alternation be matched?
                            
                                python - Return Text Between Parenthesis

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Positive lookahead in R

Tags:

regex

r

Brian P

People also ask

3 Answers

hwnd

akrun

Avinash Raj

Recent Activity

Donate For Us