Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pimp my Perl code

I'm an experienced developer, but not in Perl. I usually learn Perl to hack a script, then I forget it again until the next time. Hence I'm looking for advice from the pros.

This time around I'm building a series of data analysis scripts. Grossly simplified, the program structure is like this:

01 my $config_var = 999;

03 my $result_var = 0;

05 foreach my $file (@files) {
06   open(my $fh, $file);
07   while (<$fh>) {
08     &analyzeLine($_);
09   }
10 }

12 print "$result_var\n";

14 sub analyzeLine ($) {
15   my $line = shift(@_);
16   $result_var = $result_var + calculatedStuff;
17 }

In real life, there are up to about half a dozen different config_vars and result_vars.

These scripts differ mostly in the values assigned to the config_vars. The main loop will be the same in every case, and analyzeLine() will be mostly the same but could have some small variations.

I can accomplish my purpose by making N copies of this code, with small changes here and there; but that grossly violates all kinds of rules of good design. Ideally, I would like to write a series of scripts containing only a set of config var initializations, followed by

do theCommonStuff;

Note that config_var (and its siblings) must be available to the common code, as must result_var and its lookalikes, upon which analyzeLine() does some calculations.

Should I pack my "common" code into a module? Create a class? Use global variables?

While not exactly code golf, I'm looking for a simple, compact solution that will allow me to DRY and write code only for the differences. I think I would rather not drive the code off a huge table containing all the configs, and certainly not adapt it to use a database.

Looking forward to your suggestions, and thanks!


Update

Since people asked, here's the real analyzeLine:

# Update stats with time and call data in one line.
sub processLine ($) {
  my $line = shift(@_);
  return unless $line =~ m/$log_match/;
  # print "$1 $2\n";
  my ($minute, $function) = ($1, $2);
  $startMinute = $minute if not $startMinute;
  $endMinute = $minute;
  if ($minute eq $currentMinute) {
    $minuteCount = $minuteCount + 1;
  } else {
    if ($minuteCount > $topMinuteCount) {
      $topMinute = $currentMinute;
      $topMinuteCount = $minuteCount;
      printf ("%40s %s : %d\n", '', $topMinute, $topMinuteCount);
    }
    $totalMinutes = $totalMinutes + 1;
    $totalCount = $totalCount + $minuteCount;
    $currentMinute = $minute;
    $minuteCount = 1;
  }
}

Since these variables are largely interdependent, I think a functional solution with separate calculations won't be practical. I apologize for misleading people.

like image 209
Carl Smotricz Avatar asked Nov 30 '22 11:11

Carl Smotricz


2 Answers

Two comments: First, don't post line numbers as they make it more difficult than necessary to copy, paste and edit. Second, don't use &func() to invoke a sub. See perldoc perlsub:

A subroutine may be called using an explicit & prefix. The & is optional in modern Perl, ... Not only does the & form make the argument list optional, it also disables any prototype checking on arguments you do provide.

In short, using & can be surprising unless you know what you are doing and why you are doing it.

Also, don't use prototypes in Perl. They are not the same as prototypes in other languages and, again, can have very surprising effects unless you know what you are doing.

Do not forget to check the return value of system calls such as open. Use autodie with modern perls.

For your specific problem, collect all configuration variables in a hash. Pass that hash to analyzeLine.

#!/usr/bin/perl

use warnings; use strict;
use autodie;

my %config = (
    frobnicate => 'yes',
    machinate  => 'no',
);

my $result;
$result += analyze_file(\%config, $_) for @ARGV;

print "Result = $result\n";

sub analyze_file {
    my ($config, $file) = @_;

    my $result;

    open my $fh, '<', $file;
    while ( my $line = <$fh> ) {
        $result += analyze_line($config, $line);
    }

    close $fh;

    return $result;
}

sub analyze_line {
    my ($line) = @_;
    return length $line;
}

Of course, you will note that $config is being passed all over the place, which means you might want to turn this in to a OO solution:

#!/usr/bin/perl

package My::Analyzer;

use strict; use warnings;

use base 'Class::Accessor::Faster';

__PACKAGE__->follow_best_practice;
__PACKAGE__->mk_accessors( qw( analyzer frobnicate machinate ) );

sub analyze_file {
    my $self = shift;
    my ($file) = @_;

    my $result;

    open my $fh, '<', $file;
    while ( my $line = <$fh> ) {
        $result += $self->analyze_line($line);
    }

    close $fh;

    return $result;
}

sub analyze_line {
    my $self = shift;
    my ($line) = @_;
    return $self->get_analyzer->($line);
}

package main;

use warnings; use strict;
use autodie;

my $x = My::Analyzer->new;

$x->set_analyzer(sub {
        my $length; $length += length $_ for @_; return $length;
});
$x->set_frobnicate('yes');
$x->set_machinate('no');


my $result;
$result += $x->analyze_file($_) for @ARGV;

print "Result = $result\n";
like image 125
Sinan Ünür Avatar answered Dec 05 '22 07:12

Sinan Ünür


Go ahead and create a class hierarchy. Your task is an ideal playground for OOP style of programming. Here's an example:

package Common;
sub new{
  my $class=shift;
  my $this=bless{},$class;
  $this->init();
  return $this;
}
sub init{}
sub theCommonStuff(){ 
  my $this=shift;
  for(1..10){ $this->analyzeLine($_); }
}
sub analyzeLine(){
  my($this,$line)=@_;
  $this->{'result'}.=$line;
}

package Special1;
our @ISA=qw/Common/;
sub init{
  my $this=shift;
  $this->{'sep'}=',';   # special param: separator
}
sub analyzeLine(){      # modified logic
  my($this,$line)=@_;
  $this->{'result'}.=$line.$this->{'sep'};
}

package main;
my $c = new Common;
my $s = new Special1;
$c->theCommonStuff;
$s->theCommonStuff;
print $c->{'result'}."\n";
print $s->{'result'}."\n";
like image 42
catwalk Avatar answered Dec 05 '22 07:12

catwalk