I have updated this question, as in the original question the issue I was chasing turned out to be an alltogether different bug (not interesting in this context). But the second order mistake I did in testing is something others may run into and produced an answer with a very interesting insight, so I'll leave this here as a question.
I was trying to track down an issue with regular expressions seemingly not matching due to leading zeros. I found that all of the following regexp didn't match in my command line tests:
"005630" =~ /^0056(10|11|15|20|21|25|30|31)$/
"005630" =~ /0056(10|11|15|20|21|25|30|31)/
"005630" =~ /56(10|11|15|20|21|25|30|31)/
"005630" =~ /..56(10|11|15|20|21|25|30|31)/
"005630" =~ /..5630/
"005630" =~ /005630/
"005630" =~ /^005630$/
"005630" =~ /5630/
"005630" =~ /(0)*5630/
"005630" =~ /5630/g
"005630" =~ m/5630/g
This did match:
"x005630" =~ /0056(10|11|15|20|21|25|30|31)/
similar for others, i.e. once I added a leading letter, it works.
The test code was (tested with Cygwin Perl v5.10.1 on a Cygwin bash):
perl -e "print ( "005630" =~ /0056(10|11|15|20|21|25|30|31)/)" # does not display a true value
perl -e "print ( "x005630" =~ /0056(10|11|15|20|21|25|30|31)/)" # displays a true value
The quoting here is obviously a mistake (can't use unescaped "
in a string quoted with "
). But I still didn't understand why the second line works despite incorrect quoting.
Note: This could also occur in other situations without regular expressions.
The reason why given the commands
perl -e "print ( "005630" =~ /0056(10|11|15|20|21|25|30|31)/)"
perl -e "print ( "x005630" =~ /0056(10|11|15|20|21|25|30|31)/)"
only the second line prints a match is that Perl supports octal numeric literals. As you figured out, your shell is eating the quotes, so you're actually executing the statements:
print ( 005630 =~ /0056(10|11|15|20|21|25|30|31)/);
print ( x005630 =~ /0056(10|11|15|20|21|25|30|31)/);
Any numeric literal (an unquoted number) that begins with a zero that isn't immediately followed by a decimal point is treated as an octal number.
perl -e "print 005630 . ''" # prints 2968
perl -e "print x005630 . ''" # prints x005630
(The . ''
is needed here to ensure that the bareword is treated as a string. The =~
operator does that in your example.)
So the reason your regex doesn't match is that your string doesn't contain what you think it does.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With