Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between non-greedy search and negated character set

Tags:

regex

Is there any difference between these two regular expression patterns (assuming single-line mode is enabled): a.*?b and a[^b]*b ? What about in terms of performance?

like image 778
mpen Avatar asked Jul 10 '14 21:07

mpen


1 Answers

a.*?b has to check at each consumed character if it matches the pattern (i.e. if the next one is a b). This is called backtracking.

With the string a12b the execution would look like this:

  • Consume a
  • Consume the following 0 characters. Is the next one a b? No.
  • Consume the following character (a1). Is the next one a b? No.
  • Consume the following character (a12). Is the next one a b? Yes!
  • Consume b
  • Match

a[^b]*b consumes anything that isn't a b without asking itself questions and is much faster for longer strings because of that.

With the string a12b the execution would look like this:

  • Consume a
  • Consume anything that follows that isn't a b. (a12)
  • Consume b
  • Match

RegexHero has a benchmark feature that will demonstrate that with the .NET regex engine.

Other than the performance difference, they match the same strings in your example.

However there are situations where there is a difference between the two. In the string aa111b111b

(?<=aa.*?)b matches both b while (?<=aa[^b]*)b matches only the first one.

like image 119
dee-see Avatar answered Oct 14 '22 01:10

dee-see