After a big break of ~6 months I am back in the world of Perl and Bioinformatics, interning under a different scientist. But the very first assignment is unlike any I had encountered last time, so while I have made some progress, I haven't been able to tackle the problem in its entirety. I am also trying to revise whatever I learnt last time as fast as possible, because I completely lost touch with programming these last 6 months. The dataset looks like the following:
NR_046018 DDX11L1 , 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1.44 2.72 3.84 4.92 5.6 6.64 7.08 9.12 9.56 8.28 7.16 6.08 5.4 4.36 3.92 1.88 0 0 0.76 1 1 1 1.2 2 2 2 1.72 2 2 2 1.8 1 1.88 2.4 3 3.36 5 6 6 6.72 6.12 5.6 5.44 5.56 5 4.04 5 4.28 4 4 3.08 2.08 1.68 1.96 1.44 3 3.68 4 4.16 5 4.32 4.8 6.16 6 6.28 6.92 7.84 7 7.32 7.2 5.96 5 4.52 4.08 3 3 4.04 4.12 4.44 4 3.52 3.4 4 4 2.64 1.88 1 1 1 0.64 1 1 1.24 2 2.92 3 3 2.96 2 2 2.56 2 1.08 2.12 3 3 3 3 2.6 3 4.64 3.88 3.72 4 4 4.96 4.6 4 2.36 2 1.28 1 1 0.04 0 0.24 1.08 2.68 3.84 4.12 5.72 6 6 5.76 4.92 3.32 3.12 2.88 2.08 2 2 2 2 2 1.44 2.92 3.04 4.28 5.8 7.8 9.48 10.52 13.04 12.08 11.6 11.72 11 9.2 7.52 7.12 7.08 7.08 8.32 7 6.6 7.6 8.04 8.36 6.72 7.88 7.72 8.4 9.24 8.88 8.96 9.88 10.08 9.24 9.28 10.16 11.04 10.52 10 8.56 8 7.8 7.72 6.44 4.32 4 4 3.72 3.68 3.68 3.28 5.56 7.36 9.48 10 10.52 11 12.16 11.96 9.44 8.64 7.52 7 6.48 6 5 5.12 6.28 6 5.52 6 6.68 6.08 7.52 8.16 7.72 8.52 8.56 9.2 9.16 8.92 7.44 6 5 3.48 2.92 2.16 2 2 1.2 1 1 1 1.24 1.64 1 1 1.96 2 2 2 1.76 1 1 1 0.52 1.76 3.64 5.12 6 6 6 6 5.52 4.24 2.36 0.88 0 0 0.68 1 1 1 1 1 1 1 0.32 0 0 1 1 1.44 2.44 3.68 5.4 6.88 7 6 6.52 6.76 6.56 5.32 3.6 2.92 3 3.72 3.96 3.8 3 3 3 2.2 2.4 2.28 1.52 1 1 1 1.72 2 1.6 1 1 1 1 1 0.28 0.92 2 2 2.72 3.64 4 4.84 5 4.08 3 3 2.68 2.36 2 1.16 1 1 2 4.92 4.6 4 4 4 4 4.32 4 1.08 1 1.52 2 2 2 1.68 1 1 1.32 1.48 1 1 1.52 2 2 2 1.68 1 1 1.88 1.48 1 1 1 1 1 1 0.12 0.4 1 1 1.2 3.88 4 5 5 4.6 4 4 3.8 2.08 2 1 1 1.44 2.4 3
NR_047520 LOC643837 , 3 2.2 0.2 0 0 0.28 1 1 1 1 2.2 4.8 5 5.32 5 5 5 5 3.8 1.2 1 0.4 0 0 0 0 0 1 1 1 1 1 1 1 1.56 1 1 1 1 1 1 1 0.44 0.68 1 1.52 3 3.6 4.96 6.8 9 8.32 8.72 8.48 7 7.4 8.8 7.92 7.12 8.84 8.56 9.4 10.2 10 7.24 6.44 6.76 6.16 5.72 4.96 4.8 5.16 6 5.84 4.12 3 3 2.64 2.56 3.08 3 4.16 5 6.72 7 7.16 7.44 5.76 5 4.56 4 3.68 5 5.4 5.52 6 6 5.28 5 3.6 2 2.08 1.48 1 2 2 2 2 2 1.36 1 1 0 0 0.68 1 1 1 1 1 1 1 0.32 0 0 0 1.16 2 2 2 2 2.88 3 3 1.84 1 2 2 2.04 2.12 2 2 2 2 1 1.28 1.96 1.36 2.76 3 3 3 3 2.72 2 1.64 0.76 1 1.36 2 2 2 2 2 1.48 1 0.64 0 0.08 1 1 1.08 2 2 2 2 2.68 2 2 2.16 3.4 4 4 4.2 4.24 4 5.68 6.52 4.6 4 4 3.8 3.8 4 3.12 2.24 2.6 3 4 4 3.2 3 2.2 2 1.4 1.84 1.24 2 2 2 2 2 2 1.16 0.76 0 0 0 0 0 0 0 0.36 1 1.68 2 2 2.92 5.4 6.76 7.64 7 6.88 7 7.36 7.92 6.24 5.92 7.04 9.52 11.52 12.88 14.8 16.36 19.88 22.24 20 19.36 16.92 15.24 13.84 10.88 8.24 5.08 4.96 3.12 3 2.88 2 2.8 2.96 4 4.44 5 6 6 6 5.12 3.28 2 1.56 1 0.08 1.68 2 2 2.84 3 3 3.8 3.92 2.32 2 2.2 2.16 2 2 1.2 1 1 1 0.8 0 0 0 0.72 2.88 3 3 3 3 3 3 2.28 0.12 0 0.52 1 1 1 1 1.44 2 2 1.48 1 1 1 1.56 1.56 1 1 1 1 1 1 0.44 0.8 1.48 3 3 3 3 3 3.56 3.2 2.76 2 2 2 2 2.68 2.44 2 1.76 1 1.4 2 2 1.56 2 2 2 2 2.04 2 2 1.76 1 1 1 1 0.56 0 0 0 0 0 0 0 0 0.72 1.52 2 2 2 2 2 2 1.28 0.48
1. What is needed
2. Strategy I was thinking
@values
use List::Util qw(max)
to find maximum value from the array3. Code written so far
use warnings;
use List::Util qw(max);
#Input filename
$file = 'test1.data';
#Open file
open I, '<', $file or die;
#Separate data into keys and values, based on ','
chop (%hash = map { split /\s*,\s*/,$_,2 } grep (!/^$/,<I>));
print "$_ => $hash{$_}\n" for keys %hash; #Code is working fine till here
#Create a values array
@values = values %hash;
foreach $value(@values){
print "The values are : ", $value,"\n";
}
4. The Problem
Beyond this, I am not able to figure out how to add each "individual"
array
element into a new array so that I may use the max function.
What I mean is that for example, the first array element
in @values
contains data like 0 0 1 1 3 4.4
. The second array element
might have data like 3 2.2 0.28 1 1 4.8
. So I need to put each
of these array elements into a new array, each element going into a different array so that I may be able to use the max
function.
5. Points to Note
Most of the rows contain 400 numbers, some have a little less than that, but never more than 400.
There are a total of 23,558 rows.
I would be grateful to anyone who would be kind enough to point me in the right direction, or perhaps provide a better code to tackle the problem as mentioned in 1.
use List::Util qw( min max ); my $min = min @numbers; my $max = max @numbers; But List::MoreUtils's minmax is more efficient when you need both the min and the max (because it does fewer comparisons).
If I understand your problem correctly you're making it overly complicated:
#!/usr/bin/env perl
use strict;
use warnings;
use List::Util qw(max);
#Input filename
my $file = 'test1.data';
#Open file
open my $fh, '<', $file or die "Unable to open $file: $!\n";
my ($total, $num);
while (<$fh>) {
my @values = split;
my $max = max(@values[3 .. $#values]);
$total += $max;
$num++;
}
my $average_max = $total / $num;
Just make one pass over your file, splitting the lines into an array and feeding everything from index 3
to max
. Add $max
to $total
for each line, increment a counter ($num
) and calculate average max from that.
You should also always use use strict
and lexical filehandles.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With