Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Misunderstanding perl regexp evaluation

Tags:

regex

perl

Good time of day! I am reading a book about perl: "Programming Perl" By Larry Wall, Tom Christiansen, Jon Orwant. In this book I found several examples that were not clarified by the authors (or simply I dont get then).

The first

This prints hi only ONCE.

 "adfsfloglig"=~ /.*(?{print "hi"})f/;

But this prints "hi" TWICE?? how can it be explained?

 "adfsfloglig"=~ /.*(?{print "hi"})log/;

And continuing to experement even make things worse:

  "adfsfloglig"=~ /.*(?{print "hi"})sflog/;

The above string of code again prints only ONCE this terrifying "hi" ! After about a week I understood only one thing completely - I NEED HELP :) SO I am asking you to help me, please.

The second (this is a bomb!)

 $_ = "lothiernbfj";

 m/        (?{$i = 0; print "setting i to 0\n"})
       (.(?{ local $i = $i + 1; print "\ti is $i"; print "\tWas founded $&\n" }))*
       (?{print "\nchecking rollback\n"})
       er
       (?{ $result = $i; print "\nsetting result\n"})
 /x;
 print "final $result\n";

Here the $result finally printing on the screen is equal to number of chars that were matched by .*, but I don't get it again.

When turning on debug printing(shown above), i see, that $i is being incremented every time the new char is included in $& (matched part of a string).

In the end $i is equal 11 (amount of chars in a string), then there are 7 rollbacks, when .* returns from its match char at a time (7 times) so the match of an all pattern occurs.

But, damn magic, the result is setting to value of $i! And we were not decrementing this value anywhere! So $result should be equal 11! But it is not. And authors were right. I know.

Please, can you explain this strange perl code, i was happy to met? Thank you for any answer!

like image 430
xolodec Avatar asked Jul 20 '13 07:07

xolodec


1 Answers

From the documentation at http://perldoc.perl.org/perlre.html :

"WARNING: This extended regular expression feature is considered experimental, and may be changed without notice. Code executed that has side effects may not perform identically from version to version due to the effect of future optimisations in the regex engine. The implementation of this feature was radically overhauled for the 5.18.0 release, and its behaviour in earlier versions of perl was much buggier, especially in relation to parsing, lexical vars, scoping, recursion and reentrancy."

Even on a failed match, if the regex engine gets to the point where it has to run the code, it will run the code. If the code involves only assigning to (local?) variables and whatever operations are allowed, backtracking will cause it to undo the operations, so the failed matches will have no effect. But print operations can't be undone, with the result that you can get strings printed from a failed match. This is why the documentation warns against embedding code with "side effects".

like image 164
David Knipe Avatar answered Oct 23 '22 04:10

David Knipe