Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl regexps not matching string with leading zeros / incorrectly escaped numerals with leading zeros on command line in Perl

Tags:

regex

perl

I have updated this question, as in the original question the issue I was chasing turned out to be an alltogether different bug (not interesting in this context). But the second order mistake I did in testing is something others may run into and produced an answer with a very interesting insight, so I'll leave this here as a question.

I was trying to track down an issue with regular expressions seemingly not matching due to leading zeros. I found that all of the following regexp didn't match in my command line tests:

"005630" =~ /^0056(10|11|15|20|21|25|30|31)$/
"005630" =~ /0056(10|11|15|20|21|25|30|31)/  
"005630" =~ /56(10|11|15|20|21|25|30|31)/
"005630" =~ /..56(10|11|15|20|21|25|30|31)/
"005630" =~ /..5630/
"005630" =~ /005630/
"005630" =~ /^005630$/
"005630" =~ /5630/
"005630" =~ /(0)*5630/
"005630" =~ /5630/g
"005630" =~ m/5630/g

This did match:

"x005630" =~ /0056(10|11|15|20|21|25|30|31)/

similar for others, i.e. once I added a leading letter, it works.

The test code was (tested with Cygwin Perl v5.10.1 on a Cygwin bash):

perl -e "print ( "005630" =~ /0056(10|11|15|20|21|25|30|31)/)"   # does not display a true value
perl -e "print ( "x005630" =~ /0056(10|11|15|20|21|25|30|31)/)"  # displays a true value

The quoting here is obviously a mistake (can't use unescaped " in a string quoted with "). But I still didn't understand why the second line works despite incorrect quoting.

Note: This could also occur in other situations without regular expressions.

like image 787
FelixD Avatar asked Dec 26 '22 14:12

FelixD


1 Answers

The reason why given the commands

perl -e "print ( "005630" =~ /0056(10|11|15|20|21|25|30|31)/)"
perl -e "print ( "x005630" =~ /0056(10|11|15|20|21|25|30|31)/)"

only the second line prints a match is that Perl supports octal numeric literals. As you figured out, your shell is eating the quotes, so you're actually executing the statements:

print ( 005630 =~ /0056(10|11|15|20|21|25|30|31)/);
print ( x005630 =~ /0056(10|11|15|20|21|25|30|31)/);

Any numeric literal (an unquoted number) that begins with a zero that isn't immediately followed by a decimal point is treated as an octal number.

perl -e "print 005630 . ''"  # prints 2968
perl -e "print x005630 . ''" # prints x005630

(The . '' is needed here to ensure that the bareword is treated as a string. The =~ operator does that in your example.)

So the reason your regex doesn't match is that your string doesn't contain what you think it does.

like image 112
cjm Avatar answered May 15 '23 02:05

cjm