Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do conditional greedy match in Perl?

I want Perl to parse a code text and identify certain stuffs, example code:

use strict;
use warnings;

$/ = undef;

while (<DATA>) {
  s/(\w+)(\s*<=.*?;)/$1_yes$2/gs;
  print;
}

__DATA__
always @(posedge clk or negedge rst_n)
if(!rst_n)begin
        d1 <= 0; //perl_comment_4
        //perl_comment_5
        d2 <= 1  //perl_comment_6
                 + 2;
        end
else if( d3 <= d4 && ( d5 <= 3 ) ) begin
        d6 <= d7 +
                 (d8 <= d9 ? 1 : 0);
        //perl_comment_7
        d10 <= d11 <=
                      d12
                        + d13
                            <= d14 ? 1 : 0;
        end

Match target is something that meets all of the following:

(1) It begins with the combination word\s*<=. Here \s* maybe 0 or more spaces, newlines, tabs.

(2) The aforementioned "combination" should be out of any pair of ( and ).

(3) If multiple "combinations" appear consecutively, then take the first one as the beginning. (Something like "greedy" matching at the left boundary)

(4) it ends with the first ; after the "combination" mentioned in (1).

There may be word\s*<= and ; in code comments (there may be anything in comments); this makes things more complicated. To make life easier, I already pre-processed the text, scanning for comments and replacing them with stuff like //perl_comment_6. (This solution seems rather cumbersome and stupid. Any smarter, more elegant solutions?)

What I wanna do:

For all matched word\s*<=, replace word with word_yes. For the example code, d1, d2, d6 and d10 should be replaced by d1_yes, d2_yes, d6_yes and d10_yes, respectively, and all other parts of the text should remain unchanged.

In my current code I use s/(\w+)(\s*<=.*?;)/$1_yes$2/gs;, which correctly recognizes d1, d2 and d10, but fails to recognize d6 and mistakenly recognizes d3.

Any suggestions? Thanks in advance~

like image 600
katyusza Avatar asked Feb 22 '16 11:02

katyusza


1 Answers

This is a lot more complicated that you might imagine, and it is impossible to do properly without writing a parser for the language you are trying to process. However, you may be in luck if your sample is a consistently limited subset of the language

The best way I can see to do this is to use split to separate out all the subsections of the string that are in parentheses from the "top level" sections where the replacements are to be done. Then the changes can be made to the relevant parts and the split sections joined back together

Even this relies on the code having properly balanced parentheses, and an odd open or closing parenthesis that appears in, say, a string or a comment will throw the process out. The regex used in the split has to be recursive so that nested parentheses can be matched, and making it a capturing regex makes split returns all of the parts of the string instead of just the sections between the matches

This code will do as you ask, but beware that, as I described, it is extremely fragile

use strict;
use warnings;

my $data = do {
    local $/;
    <DATA>;
};

my @split = split / ( \( (?> [^()] | (?1) )* \) ) /x, $data;

for ( @split ) {
    next if /[()]/;
    s/ ^ \s* \w+ \K (?= \s* <= ) /_yes/xgm;
}

print join '', @split;


__DATA__
always @(posedge clk or negedge rst_n)
if(!rst_n)begin
        d1 <= 0; //perl_comment_4
        //perl_comment_5
        d2 <= 1  //perl_comment_6
                 + 2;
        end
else if( d3 <= d4 && ( d5 <= 3 ) ) begin
        d6 <= d7 +
                 (d8 <= d9 ? 1 : 0);
        //perl_comment_7
        d10 <= d11 <=
                      d12
                        + d13
                            <= d14 ? 1 : 0;
        end

output

always @(posedge clk or negedge rst_n)
if(!rst_n)begin
        d1_yes <= 0; //perl_comment_4
        //perl_comment_5
        d2_yes <= 1  //perl_comment_6
                 + 2;
        end
else if( d3 <= d4 && ( d5 <= 3 ) ) begin
        d6_yes <= d7 +
                 (d8 <= d9 ? 1 : 0);
        //perl_comment_7
        d10_yes <= d11 <=
                      d12
                        + d13
                            <= d14 ? 1 : 0;
        end
like image 99
Borodin Avatar answered Nov 10 '22 00:11

Borodin