Given this vector:
ba <- c('baa','aba','abba','abbba','aaba','aabba')'
I want to change the final a
of each word to i
except baa
and aba
.
I wrote the following line ...
gsub('(?<=a[ab]b{1,2})a','i',ba,perl=T)
but was told: PCRE pattern compilation error 'lookbehind assertion is not fixed length' at ')a'.
I looked around a little bit and apparently R/Perl can only lookahead for a variable width, not lookbehind. Any workaround to this problem? Thanks!
You can use the lookbehind alternative \K
instead. This escape sequence resets the starting point of the reported match and any previously consumed characters are no longer included.
Quoted — rexegg
The key difference between \K and a lookbehind is that in PCRE, a lookbehind does not allow you to use quantifiers: the length of what you look for must be fixed. On the other hand, \K can be dropped anywhere in a pattern, so you are free to have any quantifiers you like before \K.
Using it in context:
sub('a[ab]b{1,2}\\Ka', 'i', ba, perl=T)
# [1] "baa" "aba" "abbi" "abbbi" "aabi" "aabbi"
Avoiding lookarounds:
sub('(a[ab]b{1,2})a', '\\1i', ba)
# [1] "baa" "aba" "abbi" "abbbi" "aabi" "aabbi"
Another solution for the current case only, when the only quantifier used is a limiting quantifier, may be using stringr::str_replace_all
/ stringr::str_replace
:
> library(stringr)
> str_replace_all(ba, '(?<=a[ab]b{1,2})a', 'i')
[1] "baa" "aba" "abbi" "abbbi" "aabi" "aabbi"
It works because stringr
regex functions are based on ICU regex that features a constrained-width lookbehind:
The length of possible strings matched by the look-behind pattern must not be unbounded (no
*
or+
operators.)
So, you can't really use any kind of patterns inside ICU lookbehinds, but it is good to know you may use at least a limiting quantifier in it when you need to get overlapping texts within a known distance range.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With