Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow performance of Perl script using nested for loops

I have a large FASTA file (a genetic sequence, an entire chromosome), where each line contains 50 characters (bases a,g,t, and c). There are about 4 million lines in this file.

I want to reorganize the file so that each character of a line is placed in its own line of a new file. That is, turn each 50-character line in the original file into 50, single-character lines. This will result in the entire sequence rewritten as a single column. Ultimately, I want the sequence as a single column so I can then place an adjacent column containing the genomic coordinate position for each base.

This is how I am doing it, using perl and creating a set of for loops.

unless(@ARGV) {
    # $0 name of the program being executed;
    print "\n usage: $0 filename\n\n"; 
    exit;
}

# use shift to pull off @ARGV value and return to $list;
my $fastafile = shift; 
open(FASTA, "<$fastafile");
my @count =(<FASTA>);
close FASTA;

# print scalar @count;

for ( my $i = 0; $i < scalar @count ; $i ++ ) {

#print "$count[$i]\n\n\n\n"; 
my @seq  = split( "", $count[ $i ] ); 
print " line = $i ";
for ( my $j = 0; $j < scalar @seq; $j++ ){
    #my $count =
    print "$seq[$j]  for count = $j \n"; 

    }

}

It seems to be working, but it is being slow, very slow. I am wondering if it is slow because the FASTA file has 4 million lines, or it is slow because of my code, or both. I am looking for advice to speed up this process. Thanks!

like image 300
ES55 Avatar asked Mar 15 '26 02:03

ES55


1 Answers

The problem is that you are slurping the file. While the huge file is being slurped, the process will wait until all the I/O is over to start processing. An option is to process the file line by line:

open my $fh, '<', $fastafile or die "Error opening file: $!";

while ( my $line = <$fh> ) {
    chomp $line;    # Remove the newline from the end of each line

    my @seq = split //, $line;

    # Loop from 0 to the last index of @seq
    for my $i ( 0 .. $#seq ) {
        print "$seq[$i] for count = $i\n";
    }
}
like image 189
Alan Haggai Alavi Avatar answered Mar 16 '26 16:03

Alan Haggai Alavi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!