Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Negative look-ahead in Go regular expressions

I'm trying to use negative look-aheads in Go.

The following regular expression: BBB(((?!BBB).)*)EEE

http://rubular.com/r/Zw1vopp1MF

However, in Go I get:

error parsing regexp: invalid or unsupported Perl syntax: `(?!` 

Are there any alternatives?

like image 590
K2xL Avatar asked Nov 06 '14 04:11

K2xL


People also ask

What is a negative look ahead in regular expression?

In this type of lookahead the regex engine searches for a particular element which may be a character or characters or a group after the item matched. If that particular element is not present then the regex declares the match as a match otherwise it simply rejects that match.

Can I use negative Lookbehind?

The positive lookbehind ( (? <= ) ) and negative lookbehind ( (? <! ) ) zero-width assertions in JavaScript regular expressions can be used to ensure a pattern is preceded by another pattern.

What is look ahead in regex?

Lookahead is used as an assertion in Python regular expressions to determine success or failure whether the pattern is ahead i.e to the right of the parser's current position. They don't match anything. Hence, they are called as zero-width assertions.

What is positive and negative lookahead?

Positive lookahead: (?= «pattern») matches if pattern matches what comes after the current location in the input string. Negative lookahead: (?! «pattern») matches if pattern does not match what comes after the current location in the input string.


2 Answers

Negative lookahead isn't supported for technical reasons, specifically because it conflicts with the O(n)-time guarantees of the library. See the golang-nuts group discussion about this, as well as the Caveats section in Regular Expression Matching in the Wild.

You can express the regular expression you've described without negative lookahead:

BBB([^B]|B[^B]|BB[^B])*EEE 

Here's an example to demonstrate:

package main  import (     "fmt"     "regexp" )  func main() {     re := regexp.MustCompile(`BBB([^B]|B[^B]|BB[^B])*EEE`)     fmt.Printf("%#v\n", re.FindAllString("BBB EEE BBB..BBB...EEE", -1)) } 
like image 110
dyoo Avatar answered Sep 16 '22 19:09

dyoo


dlclark/regexp2 is a port of the .NET framework's System.Text.RegularExpressions.Regex engine.

There are some fundamental differences between .NET strings and Go strings that required a bit of borrowing from the Go framework regex engine as well. I cleaned up a couple of the dirtier bits during the port (regexcharclass.cs was terrible), but the parse tree, code emmitted, and therefore patterns matched should be identical.

It's name dropped at the end of the lengthy discussion about O(n) regular expressions, and is caveated:

However, I would advise caution as there are benefits to the re2-based engine that are not provided by more full featured engines with lookarounds. If you have the option then stick with the stdlib.

The cost of features is a slower implementation.

like image 37
Andy Avatar answered Sep 20 '22 19:09

Andy