Need enclose still not enclosed strings with an paired delimiters. Example text:
Some text or random characters here. {% Another random string
enclosed in a pair of delimiters as next {% what can be deeply
nested {% as {%here%}%} end of delimited %} text. %}Another
bla-bla
random text outside of the delimiters - called
as "free text".
Need enclose all occurences of free text with
%{ORIG .... original free text ... %}
and don't modifying the strings what is already enclosed. So, in the above example need enclose two sections of free text, and should get the next:
{%ORIG Some text or random characters here. %}{% Another random string
enclosed in a pair of delimiters as next {% what can be deeply
nested {% as {%here%}%} end of delimited %} text. %}{%ORIG Another
bla-bla
random text outside of the delimiters - called
as "free text".%}
So, the opening delimiter is {%
and the closing is %}
.
Questions:
You could do it with regex with help of recursive subpattern calls like (?R)
.
For example:
$_ = <<'_STR_';
Some text or random characters here. {% Another random string
enclosed in a pair of delimiters as next {% what can be deeply
nested {% as {%here%}%} end of delimited %} text. %}Another
bla-bla
random text outside of the delimiters - called
as "free text".
_STR_
s/
( {% (?R)* %} ) # match balanced {% %} groups
|
( (?: (?! {% | %} ) . )+ ) # match everything except {% %}
/
$1 ? $1 : "{%ORIG $2 %}"; # if {% ... %} matched, leave it as is. else enclose it
/gsex;
print;
Output:
{%ORIG Some text or random characters here. %}{% Another random string
enclosed in a pair of delimiters as next {% what can be deeply
nested {% as {%here%}%} end of delimited %} text. %}{%ORIG Another
bla-bla
random text outside of the delimiters - called
as "free text".
%}
Jonathan Leffler's suggestion is right. You can solve this problem using the Text::Balanced
module with its extract_tagged
function:
#!/usr/bin/env perl
use warnings;
use strict;
use Text::Balanced qw<extract_tagged>;
my ($open_delim, $close_delim) = qw( {% %} );
my $text = do { local $/ = undef; <> };
chomp $text;
while (1) {
my @r = extract_tagged($text, $open_delim, $close_delim, '(?s).*?(?={%)', undef);
if (length $r[2]) {
printf qq|%sORIG %s%s|, $open_delim, $r[2], $close_delim;
}
if (length $r[0]) {
printf qq|%s|, $r[0];
}
else {
if (length $r[1]) {
printf qq|%sORIG %s%s|, $open_delim, $r[1], $close_delim;
}
last;
}
$text = $r[1];
}
This program does an infinite loop until there aren't more delimiters in the text. Until that moment, in each iteration it checks the prefix (text until an opening delimiter, $r[2]
) and surrounds it with the delimiters, and for the text already surrounded with them ($r[0]
), print it as is.
At the beginning I slurp the content of the whole file because this function only works with a scalar. You should take a look to the documentation to learn what the function returns, and I hope you get the idea that will help to solve your problem, in case it is far more complex than this example.
Just for testing, run it like:
perl script.pl infile
That yields:
{%ORIG Some text or random characters here. %}{% Another random string
enclosed in a pair of delimiters as next {% what can be deeply
nested {% as {%here%}%} end of delimited %} text. %}{%ORIG Another
bla-bla
random text outside of the delimiters - called
as "free text".%}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With