How to use RegEx with . replace in JavaScript. To use RegEx, the first argument of replace will be replaced with regex syntax, for example /regex/ . This syntax serves as a pattern where any parts of the string that match it will be replaced with the new substring.
For example, the replacement pattern $1 indicates that the matched substring is to be replaced by the first captured group.
. Your regex starts with (?= (ensure that you can see, but don't consume) followed by . * (zero or more of any character).
Ok, I’m going to go from the simple to the sublime. Enjoy!
Given this:
#!/usr/bin/perl
$_ = <<"End_of_G&S";
This particularly rapid,
unintelligible patter
isn't generally heard,
and if it is it doesn't matter!
End_of_G&S
my $count = 0;
Then this:
s{
\b ( [\w']+ ) \b
}{
sprintf "(%s)[%d]", $1, ++$count;
}gsex;
produces this
(This)[1] (particularly)[2] (rapid)[3],
(unintelligible)[4] (patter)[5]
(isn't)[6] (generally)[7] (heard)[8],
(and)[9] (if)[10] (it)[11] (is)[12] (it)[13] (doesn't)[14] (matter)[15]!
Whereas this:
s/\b([\w']+)\b/#@{[++$count]}=$1/g;
produces this:
#1=This #2=particularly #3=rapid,
#4=unintelligible #5=patter
#6=isn't #7=generally #8=heard,
#9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter!
This puts the incrementation within the match itself:
s/ \b ( [\w']+ ) \b (?{ $count++ }) /#$count=$1/gx;
yields this:
#1=This #2=particularly #3=rapid,
#4=unintelligible #5=patter
#6=isn't #7=generally #8=heard,
#9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter!
This
s{ \b ( [\w'] + ) \b }
{ join " " => ($1) x ++$count }gsex;
generates this delightful answer:
This particularly particularly rapid rapid rapid,
unintelligible unintelligible unintelligible unintelligible patter patter patter patter patter
isn't isn't isn't isn't isn't isn't generally generally generally generally generally generally generally heard heard heard heard heard heard heard heard,
and and and and and and and and and if if if if if if if if if if it it it it it it it it it it it is is is is is is is is is is is is it it it it it it it it it it it it it doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't matter matter matter matter matter matter matter matter matter matter matter matter matter matter matter!
There are more robust approaches to word boundaries that work for plural possessives (the previous approaches don’t), but I suspect your mystery lies in getting the ++$count
to fire, not with the subtleties of \b
behavior.
I really wish people understood that \b
isn’t what they think it is.
They always think it means there's white space or the edge of the string
there. They never think of it as \w\W
or \W\w
transitions.
# same as using a \b before:
(?(?=\w) (?<!\w) | (?<!\W) )
# same as using a \b after:
(?(?<=\w) (?!\w) | (?!\W) )
As you see, it's conditional depending on what it's touching. That’s what the (?(COND)THEN|ELSE)
clause is for.
This becomes an issue with things like:
$_ = qq('Tis Paul's parents' summer-house, isn't it?\n);
my $count = 0;
s{
(?(?=[\-\w']) (?<![\-\w']) | (?<![^\-\w']) )
( [\-\w'] + )
(?(?<=[\-\w']) (?![\-\w']) | (?![^\-\w']) )
}{
sprintf "(%s)[%d]", $1, ++$count
}gsex;
print;
which correctly prints
('Tis)[1] (Paul's)[2] (parents')[3] (summer-house)[4], (isn't)[5] (it)[6]?
1960s-style ASCII is about 50 years out of date. Just as whenever you see anyone write [a-z]
, it’s nearly always wrong, it turns out that things like dashes and quotation marks shouldn’t show up as literals in patterns, either. While we’re at it, you probably don’t want to use \w
, because that includes numbers and underscores as well, not just alphabetics.
Imagine this string:
$_ = qq(\x{2019}Tis Ren\x{E9}e\x{2019}s great\x{2010}grandparents\x{2019} summer\x{2010}house, isn\x{2019}t it?\n);
which you could have as a literal with use utf8
:
use utf8;
$_ = qq(’Tis Renée’s great‐grandparents’ summer‐house, isn’t it?\n);
This time I’ll go at the pattern a bit differently, separating out my definition of terms from their execution to try to make it more readable and thence maintainable:
#!/usr/bin/perl -l
use 5.10.0;
use utf8;
use open qw< :std :utf8 >;
use strict;
use warnings qw< FATAL all >;
use autodie;
$_ = q(’Tis Renée’s great‐grandparents’ summer‐house, isn’t it?);
my $count = 0;
s{ (?<WORD> (?&full_word) )
# the rest is just definition
(?(DEFINE)
(?<word_char> [\p{Alphabetic}\p{Quotation_Mark}] )
(?<full_word>
# next line won't compile cause
# fears variable-width lookbehind
#### (?<! (?&word_char) ) )
# so must inline it
(?<! [\p{Alphabetic}\p{Quotation_Mark}] )
(?&word_char)
(?:
\p{Dash}
| (?&word_char)
) *
(?! (?&word_char) )
)
) # end DEFINE declaration block
}{
sprintf "(%s)[%d]", $+{WORD}, ++$count;
}gsex;
print;
That code when run produces this:
(’Tis)[1] (Renée’s)[2] (great‐grandparents’)[3] (summer‐house)[4], (isn’t)[5] (it)[6]?
Ok, so that may have beeen FMTEYEWTK about fancy regexes, but aren’t you glad you asked? ☺
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With