Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Enclosing still not enclosed strings with paired delimiters

Tags:

regex

perl

Need enclose still not enclosed strings with an paired delimiters. Example text:

Some text or random characters here. {% Another random string
enclosed in a pair of delimiters as next {% what can be deeply
nested {% as {%here%}%} end of delimited %} text. %}Another
bla-bla

random text outside of the delimiters - called
as "free text".

Need enclose all occurences of free text with

%{ORIG .... original free text ... %}

and don't modifying the strings what is already enclosed. So, in the above example need enclose two sections of free text, and should get the next:

{%ORIG Some text or random characters here. %}{% Another random string
enclosed in a pair of delimiters as next {% what can be deeply
nested {% as {%here%}%} end of delimited %} text. %}{%ORIG Another
bla-bla

random text outside of the delimiters - called
as "free text".%}

So, the opening delimiter is {% and the closing is %}.

Questions:

  • Is possible to do this with "regexes" or I need to write some parser for this?
  • Exists some CPAN module what I can use for this task?
like image 592
novacik Avatar asked Jan 19 '14 22:01

novacik


2 Answers

You could do it with regex with help of recursive subpattern calls like (?R).

For example:

$_ = <<'_STR_';
Some text or random characters here. {% Another random string
enclosed in a pair of delimiters as next {% what can be deeply
nested {% as {%here%}%} end of delimited %} text. %}Another
bla-bla

random text outside of the delimiters - called
as "free text".
_STR_

s/
  ( {% (?R)* %} )            # match balanced {% %} groups
|
  ( (?: (?! {% | %} ) . )+ ) # match everything except {% %}
/
  $1 ? $1 : "{%ORIG $2 %}";  # if {% ... %} matched, leave it as is. else enclose it
/gsex;

print;

Output:

{%ORIG Some text or random characters here.  %}{% Another random string
enclosed in a pair of delimiters as next {% what can be deeply
nested {% as {%here%}%} end of delimited %} text. %}{%ORIG Another
bla-bla

random text outside of the delimiters - called
as "free text".
 %}
like image 167
Qtax Avatar answered Nov 15 '22 05:11

Qtax


Jonathan Leffler's suggestion is right. You can solve this problem using the Text::Balanced module with its extract_tagged function:

#!/usr/bin/env perl

use warnings;
use strict;
use Text::Balanced qw<extract_tagged>;

my ($open_delim, $close_delim) = qw( {% %} );

my $text = do { local $/ = undef; <> };
chomp $text;

while (1) {
    my @r = extract_tagged($text, $open_delim, $close_delim, '(?s).*?(?={%)', undef);
    if (length $r[2]) {
        printf qq|%sORIG %s%s|, $open_delim, $r[2], $close_delim;
    }   

    if (length $r[0]) {
        printf qq|%s|, $r[0];
    }   
    else {
        if (length $r[1]) {
            printf qq|%sORIG %s%s|, $open_delim, $r[1], $close_delim;
        }
        last;
    }   

    $text = $r[1];
}

This program does an infinite loop until there aren't more delimiters in the text. Until that moment, in each iteration it checks the prefix (text until an opening delimiter, $r[2]) and surrounds it with the delimiters, and for the text already surrounded with them ($r[0]), print it as is.

At the beginning I slurp the content of the whole file because this function only works with a scalar. You should take a look to the documentation to learn what the function returns, and I hope you get the idea that will help to solve your problem, in case it is far more complex than this example.

Just for testing, run it like:

perl script.pl infile

That yields:

{%ORIG Some text or random characters here. %}{% Another random string
enclosed in a pair of delimiters as next {% what can be deeply
nested {% as {%here%}%} end of delimited %} text. %}{%ORIG Another
bla-bla

random text outside of the delimiters - called
as "free text".%}
like image 32
Birei Avatar answered Nov 15 '22 05:11

Birei