Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this statement treated as a string instead of its result?

Tags:

string

regex

perl

I am trying to perform some composition-based filtering on a large collection of strings (protein sequences).
I wrote a group of three subroutines in order to take care of it, but I'm running into trouble in two ways - one minor, one major. The minor trouble is that when I use List::MoreUtils 'pairwise' I get warnings about using $a and $b only once and them being uninitialized. But I believe I'm calling this method properly (based on CPAN's entry for it and some examples from the web).
The major trouble is an error "Can't use string ("17/32") as HASH ref while "strict refs" in use..."

It seems like this can only happen if the foreach loop in &comp is giving the hash values as a string instead of evaluating the division operation. I'm sure I've made a rookie mistake, but can't find the answer on the web. The first time I even looked at perl code was last Wednesday...

use List::Util;
use List::MoreUtils;
my @alphabet = (
 'A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I',
 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V'
);
my $gapchr = '-';
# Takes a sequence and returns letter => occurrence count pairs as hash.
sub getcounts {
 my %counts = ();
 foreach my $chr (@alphabet) {
  $counts{$chr} = ( $_[0] =~ tr/$chr/$chr/ );
 }
 $counts{'gap'} = ( $_[0] =~ tr/$gapchr/$gapchr/ );
 return %counts;
}

# Takes a sequence and returns letter => fractional composition pairs as a hash.
sub comp {
 my %comp = getcounts( $_[0] );
 foreach my $chr (@alphabet) {
  $comp{$chr} = $comp{$chr} / ( length( $_[0] ) - $comp{'gap'} );
 }
 return %comp;
}

# Takes two sequences and returns a measure of the composition difference between them, as a scalar.
# Originally all on one line but it was unreadable.

sub dcomp {
 my @dcomp = pairwise { $a - $b } @{ values( %{ comp( $_[0] ) } ) }, @{ values( %{ comp( $_[1] ) } ) };
 @dcomp = apply { $_ ** 2 } @dcomp;
 my $dcomp = sqrt( sum( 0, @dcomp ) ) / 20;
 return $dcomp;
}

Much appreciation for any answers or advice!

like image 713
reve_etrange Avatar asked Apr 06 '10 10:04

reve_etrange


1 Answers

There are a few bugs in your code. First, note from perldoc perlop:

Because the transliteration table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation.

So your counting method is incorrect. I also believe you are misusing pairwise. It is hard to evaluate what constitutes correct usage because you do not give examples of what output you should get with some simple inputs.

In any case, I would re-write this script as (there are some debugging statements sprinkled in):

#!/usr/bin/perl

use List::AllUtils qw( sum );
use YAML;

our ($a, $b);
my @alphabet = ('A' .. 'Z');
my $gap = '-';

my $seq1 = 'ABCD-EFGH--MNOP';
my $seq2 = 'EFGH-ZZZH-KLMN';

print composition_difference($seq1, $seq2);

sub getcounts {
    my ($seq) = @_;
    my %counts;
    my $pattern = join '|', @alphabet, $gap;
    $counts{$1} ++ while $seq =~ /($pattern)/g;
    warn Dump \%counts;
    return \%counts;
}

sub fractional_composition_pairs {
    my ($seq) = @_;
    my $comp = getcounts( $seq );
    my $denom = length $seq - $comp->{$gap};
    $comp->{$_} /= $denom for @alphabet;
    warn Dump $comp;
    return $comp;
}

sub composition_difference {
    # I think your use of pairwise in the original script
    # is very buggy unless every sequence always contains
    # all the letters in the alphabet and the gap character.
    # Is the gap character supposed to factor in the computations here?

    my ($comp1, $comp2) = map { fractional_composition_pairs($_) } @_;
    my %union;
    ++ $union{$_} for (keys %$comp1, keys %$comp2);

    my $dcomp;
    {
        no warnings 'uninitialized';
        $dcomp = sum map {
            ($comp1->{$_} - $comp2->{$_}) ** 2
        } keys %union;
    }

    return sqrt( $dcomp ) / 20; # where did 20 come from?
}
like image 151
Sinan Ünür Avatar answered Oct 19 '22 23:10

Sinan Ünür