Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should atomic groups be used all the time to speed up failure?

As a common example, say we want to match some word pattern $word_pattern but there may be whitespace surrounding it. This is very common usage of regex. Normally people will write

/\s*$word_pattern\s*/

But that is inefficient in case of failure isn't it? Shouldn't the efficient code be:

/(?>\s*)$word_pattern\s*/

But I never see that actually written...

Addition: yes I did now benchmark it, and since one of the responders may have issues with whitespace here, I don't want to use it.

So now I have a very long file a.txt (1GB) filled entirely with character a.

And then

perl -ne 'print !/a*b/' < a.txt

perl -ne 'print !/(?>a*)b/' < a.txt

both take significant, but SAME, amount of time (over and above the time it takes to read in the file itself).

I don't understand that at all . Can someone explain how can that be?? Perl documentation clearly says, that in the first case, there would be backtracking going on.

like image 723
Mark Galeck Avatar asked Nov 12 '22 19:11

Mark Galeck


1 Answers

"Inefficient" no, but less efficient in case of failure and in case of success. You can see a real difference for a certain amount of data.

(?>\s*) or \s*+ have two consequences:

  1. The backtrack is forbidden in case of failure after in the pattern, (but the subpattern can be "backtracked" in one solid block)
  2. backtrack positions inside an atomic group are not recorded by the regex engine, then the regex engine will work faster.

You can read this topic: http://www.perlmonks.org/?node_id=664545 on the subject.

like image 65
Casimir et Hippolyte Avatar answered Nov 15 '22 05:11

Casimir et Hippolyte