I need to match any 'r' that is preceded by two different vowels. For example, 'our' or 'pear' would be matching but 'bar' or 'aar' wouldn't. I did manage to match for the two different vowels, but I still can't make that the condition (...
) of lookbehind for the ensuing 'r'. Neither (?<=...)r
nor ...\\Kr
yields any results. Any ideas?
x <- c('([aeiou])(?!\\1)(?=(?1))') y <- c('our','pear','bar','aar') y[grepl(paste0(x,collapse=''),y,perl=T)] ## [1] "our" "pear"`
There are two types of lookarounds: Lookbehind, which is used to match a phrase that is preceded by a user specified text. Positive lookbehind is syntaxed like (? <=a)something which can be used along with any regex parameter. The above phrase matches any "something" word that is preceded by an "a" word.
The ?! n quantifier matches any string that is not followed by a specific string n.
Lookbehind has the same effect, but works backwards. It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there.
A negative look-ahead, on the other hand, is when you want to find an expression A that does not have an expression B (i.e., the pattern) after it. Its syntax is: A(?!B) . In a way, it is the opposite of a positive look-ahead.
These two solutions seem to work:
the why not way:
x <- '(?<=a[eiou]|e[aiou]|i[aeou]|o[aeiu]|u[aeio])r' y[grepl(x, y, perl=T)]
the \K
way:
x <- '([aeiou])(?!\\1)[aeiou]\\Kr' y[grepl(x, y, perl=T)]
The why not way variant (may be more efficient because it searches the "r" before):
x <- 'r(?<=a[eiou]r|e[aiou]r|i[aeou]r|o[aeiu]r|u[aeio]r)'
or to quickly exclude "r" not preceded by two vowels (without to test the whole alternation)
x <- 'r(?<=[aeiou][aeiou]r)(?<=a[eiou]r|e[aiou]r|i[aeou]r|o[aeiu]r|u[aeio]r)'
As HamZa points out in the comments using skip and fail verbs is one way to do what we want. Basically we tell it to ignore cases where we have two identical vowels followed by "r"
# The following is the beginning of the regex and isn't just R code # the ([aeiou]) captures the first vowel, the \\1 references what we captured # so this gives us the same vowel two times in a row # which we then follow with an "r" # Then we tell it to skip/fail for this ([aeiou])\\1r(*SKIP)(*FAIL)
Now we told it to skip those cases so now we tell it "or cases where we have two vowels followed by an 'r'" and since we already eliminated the cases where those two vowels are the same this will get us what we want.
|[aeiou]{2}r
Putting it together we end up with
y <- c('our','pear','bar','aar', "aa", "ae", "are", "aeer", "ssseiras") grep("([aeiou])\\1r(*SKIP)(*FAIL)|[aeiou]{2}r", y, perl = TRUE, value = TRUE) #[1] "our" "pear" "sseiras"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With