Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I change this to "idiomatic" Perl?

I am beginning to delve deeper into Perl, but am having trouble writing "Perl-ly" code instead of writing C in Perl. How can I change the following code to use more Perl idioms, and how should I go about learning the idioms?

Just an explanation of what it is doing: This routine is part of a module that aligns DNA or amino acid sequences(using Needelman-Wunch if you care about such things). It creates two 2d arrays, one to store a score for each position in the two sequences, and one to keep track of the path so the highest-scoring alignment can be recreated later. It works fine, but I know I am not doing things very concisely and clearly.

edit: This was for an assignment. I completed it, but want to clean up my code a bit. The details on implementing the algorithm can be found on the class website if any of you are interested.

sub create_matrix {
    my $self = shift;
    #empty array reference
    my $matrix = $self->{score_matrix};
    #empty array ref
    my $path_matrix = $self->{path_matrix};
    #$seq1 and $seq2 are strings set previously
    my $num_of_rows = length($self->{seq1}) + 1;
    my $num_of_columns = length($self->{seq2}) + 1;

    #create the 2d array of scores
    for (my $i = 0; $i < $num_of_rows; $i++) {
        push(@$matrix, []);
        push(@$path_matrix, []);
        $$matrix[$i][0] = $i * $self->{gap_cost};
        $$path_matrix[$i][0] = 1;
    }

    #fill out the first row
    for (my $i = 0; $i < $num_of_columns; $i++) {
        $$matrix[0][$i] = $i * $self->{gap_cost};
        $$path_matrix[0][$i] = -1;
    }
    #flag to signal end of traceback
    $$path_matrix[0][0] = 2;
    #double for loop to fill out each row
    for (my $row = 1; $row < $num_of_rows; $row++) {
        for (my $column = 1; $column < $num_of_columns; $column++) {
            my $seq1_gap = $$matrix[$row-1][$column] + $self->{gap_cost};
            my $seq2_gap = $$matrix[$row][$column-1] + $self->{gap_cost};
            my $match_mismatch = $$matrix[$row-1][$column-1] + $self->get_match_score(substr($self->{seq1}, $row-1, 1), substr($self->{seq2}, $column-1, 1));
            $$matrix[$row][$column] = max($seq1_gap, $seq2_gap, $match_mismatch);

            #set the path matrix
            #if it was a gap in seq1, -1, if was a (mis)match 0 if was a gap in seq2 1
            if ($$matrix[$row][$column] == $seq1_gap) {
                $$path_matrix[$row][$column] = -1;
            }
            elsif ($$matrix[$row][$column] == $match_mismatch) {
                $$path_matrix[$row][$column] = 0;
            }
            elsif ($$matrix[$row][$column] == $seq2_gap) {
                $$path_matrix[$row][$column] = 1;
            }
        }
    }
}
like image 911
jergason Avatar asked Oct 23 '09 16:10

jergason


1 Answers

You're getting several suggestions regarding syntax, but I would also suggest a more modular approach, if for no other reason that code readability. It's much easier to come up to speed on code if you can perceive the big picture before worrying about low-level details.

Your primary method might look like this.

sub create_matrix {
    my $self = shift;
    $self->create_2d_array_of_scores;
    $self->fill_out_first_row;
    $self->fill_out_other_rows;
}

And you would also have several smaller methods like this:

n_of_rows
n_of_cols
create_2d_array_of_scores
fill_out_first_row
fill_out_other_rows

And you might take it even further by defining even smaller methods -- getters, setters, and so forth. At that point, your middle-level methods like create_2d_array_of_scores would not directly touch the underlying data structure at all.

sub matrix      { shift->{score_matrix} }
sub gap_cost    { shift->{gap_cost}     }

sub set_matrix_value {
    my ($self, $r, $c, $val) = @_;
    $self->matrix->[$r][$c] = $val;
}

# Etc.
like image 150
FMc Avatar answered Sep 27 '22 19:09

FMc