I want to catch roman numbers inside string (numbers below 80 is fine enough). I found good base for it in How do you match only valid roman numerals with a regular expression?. Problem is: it deals with whole strings. I did not found yet a solution how to detect roman numbers inside string, because there is nothing mandatory, every group may be optional. So far i tried something like this:
my $x = ' some text I-LXIII iv more ';
if ( $x =~ s/\b(
(
(XC|XL|L?X{0,3}) # first group 10-90
|
(IX|IV|V?I{0,3}) # second group 1-9
)+
)
\b/>$1</xgi ) { # mark every occurrence
say $x;
}
__END__
><some>< ><text>< ><>I<><-><>LXIII<>< ><>iv<>< ><more><
desired output:
some text >I<->LXIII< >iv< more
So, this one captures word boundaries by themself too, because all groups are optional. How to get it done? How to make one of those 2 groups mandatory while there is no possible to tell which one is mandatory? Other approaches to catch romans are welcome too.
You can use Roman CPAN module
use Roman;
my $x = ' some text I-LXIII VII XCVI IIIXII iv more ';
if ( $x =~
s/\b
(
[IVXLC]+
)
\b
/isroman($1) ? ">$1<" : $1/exgi ) {
say $x;
}
output:
some text >I<->LXIII< >VII< >XCVI< IIIXII >iv< more
This is where Perl lets us down with its missing \<
and \>
(beginning and end word boundary) constructs that are available elsewhere. A pattern like \b...\b
will match even if the ...
consumes none of the target string because the second \b
will happily match the beginning word boundary a second time.
However an end word boundary is just (?<=\w)(?!\w)
so we can use this instead.
This program will do what you want. It does a look-ahead for a string of potential Roman characters enclosed in word boundaries (so we must be at a beginning word boundary) and then checks for a legal Roman number that isn't followed by a word character (so now we're at an end word boundary).
Note that I've reversed your >...<
marks as they were confusing me.
use strict;
use warnings;
use feature 'say';
my $x = ' some text I-LXIII iv more ';
if ( $x =~ s{
(?= \b [CLXVI]+ \b )
(
(?:XC|XL|L?X{0,3})?
(?:IX|IV|V?I{0,3})?
)
(?!\w)
}
{<$1>}xgi ) {
say $x;
}
output
some text <I>-<LXIII> <iv> more
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With