I am trying to replace instances in a string which begin with <code>abc</code> in a text I'm working with in R. The output text is highlighted in HTML over a couple of passes, so I need the replacement to ignore text inside HTML carets. The following seems to work in Python but I'm not getting any hits on my regex in R. All help appreciated. <pre class="prettyprint"><code>test <- 'abcdef abcdefabc abcdef abc defabc' gsub('\\babc\\(?![^<]*>\\)', 'xxx', test) </code></pre> Expected output: <pre class="prettyprint"><code>xxxdef xxxdefabc xxxdef xxx defabc </code></pre> Instead it is ignoring all instances of <code>abc</code>.

You need to remove unnecessary escapes and use <code>perl=TRUE</code>: <pre class="prettyprint"><code>test <- 'abcdef abcdefabc abcdef abc defabc' gsub('\\babc(?![^<]*>)', 'xxx', test, perl=TRUE) ## => [1] "xxxdef xxxdefabc xxxdef xxx defabc" </code></pre> See the online R demo When you escape <code>(</code>, it matches a literal <code>(</code> symbol, so, in your pattern, <code>\\(?![^<]*>\\)</code> matches a <code>(</code> 1 or 0 times, then <code>!</code>, then 0+ chars other than <code><</code>, then <code>></code> and a literal <code>)</code>. In my regex, <code>(?![^<]*>)</code> is a negative lookahead that fails the match if an <code>abc</code> is followed with any 0+ chars other than <code><</code> and then a <code>></code>. Without <code>perl=TRUE</code>, R <code>gsub</code> uses the TRE regex flavor that does not support lookarounds (even lookaheads). Thus, you have to tell <code>gsub</code> via <code>perl=TRUE</code> that you want the PCRE engine to be used. See the online PCRE regex demo.

Negative lookahead in R not behaving as expected

Q: What is Lookbehind assertion?

Lookbehind assertion: Matches "x" only if "x" is preceded by "y". For example, /(? <=Jack)Sprat/ matches "Sprat" only if it is preceded by "Jack". /(?

Q: Does grep support negative lookahead?

Negative lookahead, which is what you're after, requires a more powerful tool than the standard grep . You need a PCRE-enabled grep. If you have GNU grep , the current version supports options -P or --perl-regexp and you can then use the regex you wanted.

Q: Can I use negative lookahead?

Negative lookahead That's a number \d+ , NOT followed by € . For that, a negative lookahead can be applied. The syntax is: X(?! Y) , it means "search X , but only if not followed by Y ".

Tags:

regex

r

I am trying to replace instances in a string which begin with abc in a text I'm working with in R. The output text is highlighted in HTML over a couple of passes, so I need the replacement to ignore text inside HTML carets.

The following seems to work in Python but I'm not getting any hits on my regex in R. All help appreciated.

test <- 'abcdef abc<span abc>defabc abcdef</span> abc defabc'
gsub('\\babc\\(?![^<]*>\\)', 'xxx', test)

Expected output:

xxxdef xxx<span abc>defabc xxxdef</span> xxx defabc

Instead it is ignoring all instances of abc.

943

asked Apr 17 '17 19:04

Rich Ard

1 Answers

You need to remove unnecessary escapes and use perl=TRUE:

test <- 'abcdef abc<span abc>defabc abcdef</span> abc defabc'
gsub('\\babc(?![^<]*>)', 'xxx', test, perl=TRUE)
## => [1] "xxxdef xxx<span abc>defabc xxxdef</span> xxx defabc"

See the online R demo

When you escape (, it matches a literal ( symbol, so, in your pattern, \\(?![^<]*>\\) matches a ( 1 or 0 times, then !, then 0+ chars other than <, then > and a literal ). In my regex, (?![^<]*>) is a negative lookahead that fails the match if an abc is followed with any 0+ chars other than < and then a >.

Without perl=TRUE, R gsub uses the TRE regex flavor that does not support lookarounds (even lookaheads). Thus, you have to tell gsub via perl=TRUE that you want the PCRE engine to be used.

See the online PCRE regex demo.

answered Oct 25 '22 05:10

Wiktor Stribiżew

Related questions
                            
                                Using a loop in an ODE to graphically compare different parameters R
                            
                                Using a font from extrafont in grid.draw
                            
                                Remove duplicate rows in R data frame, based on a date field and another field
                            
                                R: Error in pi[[j]] : subscript out of bounds -- rbind on a list of dataframes
                            
                                Convert upper triangular part of a matrix to 3-column long format
                            
                                XGBoost Error when using xgboost function
                            
                                calculate area of overlapping density plot by ggplot using R
                            
                                plotting time series data in ggplot2 with facet_wrap
                            
                                How to print git history in rmarkdown? [closed]
                            
                                Binding outside variables in R
                            
                                Calculating the appropriate inset value for legends automatically
                            
                                zlib/bz2 library and headers are requried for compiling R
                            
                                Subset list of vectors with vector of positions
                            
                                Sequence increasing by percentage
                            
                                How to find location of package?
                            
                                Greek letter in title of R Markdown document
                            
                                Understanding lazy evaluation in R
                            
                                Add leading zero within a character string
                            
                                Stop parsing out zeros after decimals in ggplot2's annotate
                            
                                Reduce lists by summing element-wise in purrr

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With