Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is an alternative for lookbehind with C++ RegEx?

I am using the following pattern:

(?<=<)(?<!>).*?q.*?(?!<)(?=>)

It uses positive and negative lookahead and lookbehind to match the literal q that is enclosed in matching brackets.

std::regex does not support lookbehind. So what would be a good alternative?

like image 955
Joey Avatar asked Apr 19 '17 18:04

Joey


People also ask

What is Lookbehind in regex?

Regex Lookbehind is used as an assertion in Python regular expressions(re) to determine success or failure whether the pattern is behind i.e to the right of the parser's current position. They don't match anything. Hence, Regex Lookbehind and lookahead are termed as a zero-width assertion.

Does SED support Lookbehind?

sed does not support lookaround assertions. For what it's worth, grep -P is also a nonstandard extension, though typically available on Linux (but not other platforms).

What is lookahead and Lookbehind?

The lookbehind asserts that what immediately precedes the current position is a lowercase letter. And the lookahead asserts that what immediately follows the current position is an uppercase letter.

What is regex lookaround?

Lookarounds are zero width assertions. They check for a regex (towards right or left of the current position - based on ahead or behind), succeeds or fails when a match is found (based on if it is positive or negative) and discards the matched portion.


1 Answers

Note that (?<=<)(?<!>) is equal to (?<=<) (since a < is required immediately to the left of the current location, there cannot be any >) and (?!<)(?=>) is equal to (?=>) (same logic applies here, as > must be immediately to the right, there won't be any <). The first .*? will not match the shortest substring possible, it will literally find its way to the first q that is followed with any 0+ chars up to the first >. So, the pattern is hardly working for you even in the lookbehind-supporting engine.

I'd rather use <([^<>q]*q[^<>]*)> regex with a capturing group and literal consuming < and > symbols at the start/end of the expression:

std::regex r("<([^<>q]*q[^<>]*)>");
std::string s = "<adqsdq<><abc>5<abq>6<qaz> <hjfffffffk>";
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r);
                         i != std::sregex_iterator();
                         ++i)
{
    std::cout << (*i).str(1)  << srd::endl;
}

See the C++ demo

Output: abq and qaz

like image 104
Wiktor Stribiżew Avatar answered Sep 19 '22 15:09

Wiktor Stribiżew