Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl Regexp::Common package not matching certain real numbers when used with word boundary

Tags:

regex

perl

pcre

The following code below print "34" instead of the expected ".34"

use strict;
use warnings;

use Regexp::Common;

my $regex = qr/\b($RE{num}{real})\s*/;
my $str = "This is .34 meters of cable";

if ($str =~ /$regex/) {
    print $1;
}

Do I need to fix my regex? (The word boundary is need as not including it will cause it match something string like xx34 which I don't want to)

Or is it is a bug in Regexp::Common? I always thought that a longest match should win.

like image 316
Demeter P. Chen Avatar asked Dec 16 '25 20:12

Demeter P. Chen


1 Answers

The word boundary is a context-dependent regex construct. When it is followed with a word char (letter, digit or _) this location should be preceded either with the start of a string or a non-word char. In this concrete case, the word boundary is followed with a non-word char and thus requires a word char to appear right before this character.

You may use a non-ambiguous word boundary expressed with a negative lookbehind:

my $regex = qr/(?<!\w)($RE{num}{real})/;
               ^^^^^^^

The (?<!\w) negative lookbehind always denotes one thing: fail the match if there is no word character immediately to the left of the current location.

Or, use a whitespace boundary if you want your matches to only occur after whitespace or start of string:

my $regex = qr/(?<!\S)($RE{num}{real})/;
               ^^^^^^^
like image 140
Wiktor Stribiżew Avatar answered Dec 19 '25 18:12

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!