Should atomic groups be used all the time to speed up failure?

Question

As a common example, say we want to match some word pattern $word_pattern but there may be whitespace surrounding it. This is very common usage of regex. Normally people will write

/\s*$word_pattern\s*/

But that is inefficient in case of failure isn't it? Shouldn't the efficient code be:

/(?>\s*)$word_pattern\s*/

But I never see that actually written...

Addition: yes I did now benchmark it, and since one of the responders may have issues with whitespace here, I don't want to use it.

So now I have a very long file a.txt (1GB) filled entirely with character a.

And then

perl -ne 'print !/a*b/' < a.txt

perl -ne 'print !/(?>a*)b/' < a.txt

both take significant, but SAME, amount of time (over and above the time it takes to read in the file itself).

I don't understand that at all . Can someone explain how can that be?? Perl documentation clearly says, that in the first case, there would be backtracking going on.

Casimir et Hippolyte · Accepted Answer

"Inefficient" no, but less efficient in case of failure and in case of success. You can see a real difference for a certain amount of data.

(?>\s*) or \s*+ have two consequences:

The backtrack is forbidden in case of failure after in the pattern, (but the subpattern can be "backtracked" in one solid block)
backtrack positions inside an atomic group are not recorded by the regex engine, then the regex engine will work faster.

You can read this topic: http://www.perlmonks.org/?node_id=664545 on the subject.

Should atomic groups be used all the time to speed up failure?

Tags:

regex

optimization

perl

Mark Galeck

1 Answers

Casimir et Hippolyte

Recent Activity

Donate For Us

Should atomic groups be used all the time to speed up failure?

Tags:

regex

optimization

perl

Mark Galeck

1 Answers

Casimir et Hippolyte

Related questions

Recent Activity

Donate For Us