The following code below print "34" instead of the expected ".34"
use strict;
use warnings;
use Regexp::Common;
my $regex = qr/\b($RE{num}{real})\s*/;
my $str = "This is .34 meters of cable";
if ($str =~ /$regex/) {
print $1;
}
Do I need to fix my regex? (The word boundary is need as not including it will cause it match something string like xx34 which I don't want to)
Or is it is a bug in Regexp::Common? I always thought that a longest match should win.
The word boundary is a context-dependent regex construct. When it is followed with a word char (letter, digit or _) this location should be preceded either with the start of a string or a non-word char. In this concrete case, the word boundary is followed with a non-word char and thus requires a word char to appear right before this character.
You may use a non-ambiguous word boundary expressed with a negative lookbehind:
my $regex = qr/(?<!\w)($RE{num}{real})/;
^^^^^^^
The (?<!\w) negative lookbehind always denotes one thing: fail the match if there
is no word character immediately to the left of the current location.
Or, use a whitespace boundary if you want your matches to only occur after whitespace or start of string:
my $regex = qr/(?<!\S)($RE{num}{real})/;
^^^^^^^
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With