Very simple problem. I just need to capture some strings using a regex positive lookbehind, but I don't see a way to do it.
Here's an example, suppose I have some strings:
library(stringr)
myStrings <- c("MFG: acme", "something else", "MFG: initech")
I want to extract the words which are prefixed with "MFG:"
> result_1 <- str_extract(myStrings,"MFG\\s*:\\s*\\w+")
>
> result_1
[1] "MFG: acme" NA "MFG: initech"
That almost does it, but I don't want to include the "MFG:" part, so that's what a "positive lookbehind" is for:
> result_2 <- str_extract(myStrings,"(?<=MFG\\s*:\\s*)\\w+")
Error in stri_extract_first_regex(string, pattern, opts_regex = attr(pattern, :
Look-Behind pattern matches must have a bounded maximum length. (U_REGEX_LOOK_BEHIND_LIMIT)
>
It is complaining about needing a "bounded maximum length", but I don't see where to specify that. How do I make positive-lookbehind work? Where, exactly, can I specify this "bounded maximum length"?
We can use a regex lookaround. The lookbehind would take only exact matches.
str_extract(myStrings, "(?<=MFG:\\s)\\w+")
#[1] "acme" NA "initech"
You need to use str_match
since the pattern for "lookbehind" is a literal, and you just do not know the number of whitespaces:
> result_1 <- str_match(myStrings,"MFG\\s*:\\s*(\\w+)")
> result_1[,2]
##[1] "acme" NA "initech"
The results you need will be in the second column.
Note the str_extract
cannot be used here since that function drops the captured values.
And a bonus: the lookbehind is not infinite-width, but it is constrained-width in ICU regex. So, this will also work:
> result_1 <- str_extract(myStrings,"(?<=MFG\\s{0,100}:\\s{0,100})\\w+")
> result_1
[1] "acme" NA "initech"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With