Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: lookaround within lookaround

I need to match any 'r' that is preceded by two different vowels. For example, 'our' or 'pear' would be matching but 'bar' or 'aar' wouldn't. I did manage to match for the two different vowels, but I still can't make that the condition (...) of lookbehind for the ensuing 'r'. Neither (?<=...)r nor ...\\Kr yields any results. Any ideas?

x <- c('([aeiou])(?!\\1)(?=(?1))') y <- c('our','pear','bar','aar') y[grepl(paste0(x,collapse=''),y,perl=T)] ## [1] "our"  "pear"` 
like image 682
dasf Avatar asked Apr 13 '15 12:04

dasf


People also ask

What is a positive Lookbehind in regex?

There are two types of lookarounds: Lookbehind, which is used to match a phrase that is preceded by a user specified text. Positive lookbehind is syntaxed like (? <=a)something which can be used along with any regex parameter. The above phrase matches any "something" word that is preceded by an "a" word.

What is ?! In regex?

The ?! n quantifier matches any string that is not followed by a specific string n.

How does regex Lookbehind work?

Lookbehind has the same effect, but works backwards. It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there.

What is a negative look ahead?

A negative look-ahead, on the other hand, is when you want to find an expression A that does not have an expression B (i.e., the pattern) after it. Its syntax is: A(?!B) . In a way, it is the opposite of a positive look-ahead.


2 Answers

These two solutions seem to work:

the why not way:

x <- '(?<=a[eiou]|e[aiou]|i[aeou]|o[aeiu]|u[aeio])r' y[grepl(x, y, perl=T)] 

the \K way:

x <- '([aeiou])(?!\\1)[aeiou]\\Kr' y[grepl(x, y, perl=T)] 

The why not way variant (may be more efficient because it searches the "r" before):

x <- 'r(?<=a[eiou]r|e[aiou]r|i[aeou]r|o[aeiu]r|u[aeio]r)' 

or to quickly exclude "r" not preceded by two vowels (without to test the whole alternation)

x <- 'r(?<=[aeiou][aeiou]r)(?<=a[eiou]r|e[aiou]r|i[aeou]r|o[aeiu]r|u[aeio]r)' 
like image 182
Casimir et Hippolyte Avatar answered Sep 24 '22 05:09

Casimir et Hippolyte


As HamZa points out in the comments using skip and fail verbs is one way to do what we want. Basically we tell it to ignore cases where we have two identical vowels followed by "r"

# The following is the beginning of the regex and isn't just R code # the ([aeiou]) captures the first vowel, the \\1 references what we captured # so this gives us the same vowel two times in a row # which we then follow with an "r" # Then we tell it to skip/fail for this ([aeiou])\\1r(*SKIP)(*FAIL) 

Now we told it to skip those cases so now we tell it "or cases where we have two vowels followed by an 'r'" and since we already eliminated the cases where those two vowels are the same this will get us what we want.

|[aeiou]{2}r 

Putting it together we end up with

y <- c('our','pear','bar','aar', "aa", "ae", "are", "aeer", "ssseiras") grep("([aeiou])\\1r(*SKIP)(*FAIL)|[aeiou]{2}r", y, perl = TRUE, value = TRUE) #[1] "our"    "pear"    "sseiras" 
like image 28
Dason Avatar answered Sep 24 '22 05:09

Dason