I want Perl to parse a code text and identify certain stuffs, example code:
use strict;
use warnings;
$/ = undef;
while (<DATA>) {
s/(\w+)(\s*<=.*?;)/$1_yes$2/gs;
print;
}
__DATA__
always @(posedge clk or negedge rst_n)
if(!rst_n)begin
d1 <= 0; //perl_comment_4
//perl_comment_5
d2 <= 1 //perl_comment_6
+ 2;
end
else if( d3 <= d4 && ( d5 <= 3 ) ) begin
d6 <= d7 +
(d8 <= d9 ? 1 : 0);
//perl_comment_7
d10 <= d11 <=
d12
+ d13
<= d14 ? 1 : 0;
end
Match target is something that meets all of the following:
(1) It begins with the combination word\s*<=
. Here \s*
maybe 0 or more spaces, newlines, tabs.
(2) The aforementioned "combination" should be out of any pair of (
and )
.
(3) If multiple "combinations" appear consecutively, then take the first one as the beginning. (Something like "greedy" matching at the left boundary)
(4) it ends with the first ;
after the "combination" mentioned in (1).
There may be word\s*<=
and ;
in code comments (there may be anything in comments); this makes things more complicated. To make life easier, I already pre-processed the text, scanning for comments and replacing them with stuff like //perl_comment_6
. (This solution seems rather cumbersome and stupid. Any smarter, more elegant solutions?)
What I wanna do:
For all matched word\s*<=
, replace word
with word_yes
. For the example code, d1, d2, d6 and d10 should be replaced by d1_yes, d2_yes, d6_yes and d10_yes, respectively, and all other parts of the text should remain unchanged.
In my current code I use s/(\w+)(\s*<=.*?;)/$1_yes$2/gs;
, which correctly recognizes d1, d2 and d10, but fails to recognize d6 and mistakenly recognizes d3.
Any suggestions? Thanks in advance~
This is a lot more complicated that you might imagine, and it is impossible to do properly without writing a parser for the language you are trying to process. However, you may be in luck if your sample is a consistently limited subset of the language
The best way I can see to do this is to use split
to separate out all the subsections of the string that are in parentheses from the "top level" sections where the replacements are to be done. Then the changes can be made to the relevant parts and the split sections joined back together
Even this relies on the code having properly balanced parentheses, and an odd open or closing parenthesis that appears in, say, a string or a comment will throw the process out. The regex used in the split
has to be recursive so that nested parentheses can be matched, and making it a capturing regex makes split
returns all of the parts of the string instead of just the sections between the matches
This code will do as you ask, but beware that, as I described, it is extremely fragile
use strict;
use warnings;
my $data = do {
local $/;
<DATA>;
};
my @split = split / ( \( (?> [^()] | (?1) )* \) ) /x, $data;
for ( @split ) {
next if /[()]/;
s/ ^ \s* \w+ \K (?= \s* <= ) /_yes/xgm;
}
print join '', @split;
__DATA__
always @(posedge clk or negedge rst_n)
if(!rst_n)begin
d1 <= 0; //perl_comment_4
//perl_comment_5
d2 <= 1 //perl_comment_6
+ 2;
end
else if( d3 <= d4 && ( d5 <= 3 ) ) begin
d6 <= d7 +
(d8 <= d9 ? 1 : 0);
//perl_comment_7
d10 <= d11 <=
d12
+ d13
<= d14 ? 1 : 0;
end
always @(posedge clk or negedge rst_n)
if(!rst_n)begin
d1_yes <= 0; //perl_comment_4
//perl_comment_5
d2_yes <= 1 //perl_comment_6
+ 2;
end
else if( d3 <= d4 && ( d5 <= 3 ) ) begin
d6_yes <= d7 +
(d8 <= d9 ? 1 : 0);
//perl_comment_7
d10_yes <= d11 <=
d12
+ d13
<= d14 ? 1 : 0;
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With