I was reading/learning The Greatest Regex Trick Ever where we say we want something unless...using (*SKIP)(*FAIL)
. OK so I took it for a spin on the toy example below and it works in base R but has the following error in stringi. Do I need to do something different with stringi to get the syntax to work?
x <- c("I shouldn't", "you should", "I know", "'bout time")
pat <- '(?:houl)(*SKIP)(*FAIL)|(ou)'
grepl(pat, x, perl = TRUE)
## [1] FALSE TRUE FALSE TRUE
stringi::stri_detect_regex(x, pat)
## Error in stringi::stri_detect_regex(x, pat) :
## Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX)
The stringi
module (and stringr
as well) is bundled with the ICU regex library and (*SKIP)(*FAIL)
verbs are not supported (they are actually only supported by PCRE library).
Since you are matching ou
that are not preceded with h
and not followed with l
, you can use usual lookarounds:
(?<!h)ou(?!l)
See the regex demo
> x <- c("I shouldn't", "you should", "I know", "'bout time")
> pat1 <- "(?<!h)ou(?!l)"
> stringi::stri_detect_regex(x, pat1)
[1] FALSE TRUE FALSE TRUE
I can also suggest another approach here. Since your code implies you want to just return a boolean value indicating if there is ou
inside a string but not houl
, you may use
stringi::stri_detect_regex(x, "^(?!.*houl).*ou")
See another regex demo
Details
^
- start of the string(?!.*houl)
- a negative lookahead that fails the match if right after the start of string there are 0+ chars other than line break chars as many as possible followed with houl
.*
- 0+ chars other than line break chars as many as possibleou
- an ou
substring.More details on Lookahead and Lookbehind Zero-Length Assertions.
Note that in ICU a lookbehind cannot contain patterns of unknown width, however, limiting quantifiers inside lookbehinds are supported. So, in stringi
, if you wanted to match any word containing ou
that is not preceded with s
somewhere to the left, you can use
> pat2 <- "(?<!s\\w{0,100})ou"
> stringi::stri_detect_regex(x, pat2)
[1] FALSE TRUE FALSE TRUE
Where (?<!s\\w{0,100})
constrained-width lookbehind fails the match if ou
is preceded with s
followed with 0 to 100 alphanumeric or underscore characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With