Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rewriting old program using deprecated $*

Tags:

perl

We have some really old Perl code last updated 1997. I'm trying to upgrade to a newer Perl version where $* is deprecated.

I've been trying to learn how to rewrite this but the only help you get from the perlvar documentation is "You should use the /s and /m regexp modifiers instead."

  my ($file, $regexp, $flags) = @_;
  my (@found_lines, @tmp_list, $comp_buf);
  local ($*);

  if ($flags =~ tr/c//d)
  {
    $* = 1;
    (substr ($regexp, 0, 1) ne "^") && ($regexp = "^.*$regexp");
    ($regexp !~ /([^\\]|^)(\\\\)*\$$/) && ($regexp .= ".*\$");
    &read_comp ($file, \$comp_buf);
    @found_lines = grep ($_ .= "\n", ($comp_buf =~ /$regexp/g));
  }
  else
  {
    @tmp_list = &read_list ($file, 0);
    @found_lines = grep (/$regexp/, @tmp_list);
  }

  if ($flags eq "q")
  {
    $#found_lines >= 0;
  }
  elsif ($flags eq "a")
  {
    $#found_lines+1;
  }
  else
  {
    @found_lines;
  }

It's really hard to know how to replace $* here for me, from what I can understand from the comments we use $* here to enable multi-line matching for the following regexp search. So I'm guessing I have to add those flags to the regexp expressions somehow.

How do I rewrite this code to replace the existing $* instances?

like image 732
Morti Avatar asked Sep 30 '19 10:09

Morti


1 Answers

Unfortunately $* is a global variable, so setting it has an effect on all called functions (e.g. read_comp) if they use regexes.

Also, that code is written in a slightly bizarre way:

  • I assume the intention was to enable "multiline" matching for the $comp_buf =~ /$regexp/g part, but $* is set early, so it also affects $regexp !~ /([^\\]|^)(\\\\)*\$$/ and the read_comp call.

  • The checks for whether $regexp already starts/ends with ^/$ respectively are broken. For example, (?:^foo$) is an anchored regex, but the code would not detect that.

  • grep ($_ .= "\n", ...) is a baffling abuse of grep to emulate map. What the code is trying to do is to get the list of lines matched by the regex. However, the way the regex is built it does not match the terminating newline character "\n" on each line, so the code manually adds "\n" to every returned string.

    The sane way of doing that would be:

    @found_lines = map $_ . "\n", ...;   # or map "$_\n", ...
    

    Instead of map we could use an imperative loop, taking advantage of the fact that for aliases the loop variable to the current list element:

    @temp = ...;
    for (@temp) {
        $_ .= "\n";
    }
    @found_lines = @temp;
    

    Instead of a for loop we could use grep for its side effect of iterating over a list:

    @temp = ...;
    grep $_ .= "\n", @temp;
    @found_lines = @temp;
    

    grep also aliases $_ to the current element, so the "filter expression" can modify the list we're iterating over.

    Finally, because .= returns the resulting string (and strings containing "\n" cannot be false), we can take advantage of the fact that our "filter expression" always returns a true value and effectively get a copy of the input list as the return value from grep:

    @found_lines = grep $_ .= "\n", ...  # blergh
    

As for the effect of $*: It is a boolean flag (initially false). If set to true, all regexes behave as if /m is in effect, i.e. ^ and $ match at embedded newlines as well as the beginning/end of the string.

Assuming my interpretation of the code is correct, you should be able to change it as follows:

  • local ($*); can be removed.
  • $* = 1; also needs to go.
  • $comp_buf =~ /$regexp/g should be changed to $comp_buf =~ /$regexp/mg. This is the only place I see where multiline mode makes sense.
  • I'd really like to rewrite the last line. Either

    @found_lines = map "$_\n", ($comp_buf =~ /$regexp/g);
    

    (functional style), or, if you prefer a more imperative style:

    @found_lines = ($comp_buf =~ /$regexp/g);
    $_ .= "\n" for @found_lines;
    
like image 135
melpomene Avatar answered Oct 11 '22 12:10

melpomene