I'm interested in changing the regex word boundary \b
to include other characters (for example, a .
wouldn't count as a boundary). I understand that it is a boundary between \w
and \W
characters.
my $_ = ".test";
if ( /(\btest\b)/ ){
print;
print " $1\n";
}
if ( /((?:(?<=\W)|^)test(?:(?=\W)|$))/ ){
print;
print " $1\n";
}
This is what I came up with, and all I'd have to do is change \W
to something like [^\w.]
, but I still want to know how Perl interprets \b
in a regular expression. I tried deparsing it like this:
my $deparser = B::Deparse->new("-sC", "-x10");
print $deparser->coderef2text( sub {
my $_ = ".test";
if ( /(\btest\b)/ ){
print;
print " $1\n";
}
if ( /((?:(?<=\W)|^)test(?:(?=\W)|$))/ ){
print;
print " $1\n";
}
});
I was hoping it would expand \b
into what it was equivalent to. What is \b
equivalent to? Can you deparse \b
or other expressions further somehow?
\b
is functionally equivalent to (?<!\w)(?=\w)|(?<=\w)(?!\w)
.
\B
is functionally equivalent to (?<!\w)(?!\w)|(?<=\w)(?=\w)
.
The goal of Deparse is to produce a readable representation of Perl's understanding of the code. For example, f() and g();
and g() if f();
compile identically, so Deparse will give the more readable option, g() if f();
, for both.
$ perl -MO=Deparse -e'f() and g()'
g() if f();
-e syntax OK
This means that if \b
and (?<!\w)(?=\w)|(?<=\w)(?!\w)
compiled to the same code, Deparse would still give you \b
if it understood compiled regex. Deparse is not what you want.
Maybe you're thinking of Concise. It shows what really gets executed. Notice the use of and
in the following even though the original Perl uses if
:
$ perl -MO=Concise,-exec -e'g() if f()'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <0> pushmark s
4 <#> gv[*f] s/EARLYCV
5 <1> entersub[t6] sKS/TARG
6 <|> and(other->7) vK/1
7 <0> pushmark s
8 <#> gv[*g] s/EARLYCV
9 <1> entersub[t3] vKS/TARG
a <@> leave[1 ref] vKP/REFC
-e syntax OK
But like Deparse, Concise knows nothing of the regex program the regex engine created from the string. So this is still not what you want.
However, there is an equivalent of Concise for regex patterns: use re 'debug';
.
$ perl -Mre=debug -E'qr/\b/'
Compiling REx "\b"
Final program:
1: BOUNDU (2)
2: END (0)
stclass BOUNDU minlen 0
Freeing REx: "\b"
Apparently, \b
is implemented as its own operation. For comparison,
$ perl -Mre=debug -E'qr/(?<!\w)(?=\w)|(?<=\w)(?!\w)/'
Compiling REx "(?<!\w)(?=\w)|(?<=\w)(?!\w)"
Final program:
1: BRANCH (12)
2: UNLESSM[-1] (7)
4: POSIXU[\w] (5)
5: SUCCEED (0)
6: TAIL (7)
7: IFMATCH[0] (23)
9: POSIXU[\w] (10)
10: SUCCEED (0)
11: TAIL (23)
12: BRANCH (FAIL)
13: IFMATCH[-1] (18)
15: POSIXU[\w] (16)
16: SUCCEED (0)
17: TAIL (18)
18: UNLESSM[0] (23)
20: POSIXU[\w] (21)
21: SUCCEED (0)
22: TAIL (23)
23: END (0)
minlen 0
Freeing REx: "(?<!\w)(?=\w)|(?<=\w)(?!\w)"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With