Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove stop codon from file and replace it with NNN

I want to write Perl code which checks for stop codon and replaces it with NNN. I have written code as follows:

#!/usr/bin/perl
use strict;
use warnings;

# Check if the file name is provided as an argument
my $file = $ARGV[0];

open(my $fh, "<", $file) or die "Unable to open file";

my $sequence = "";
my $id = "";

while (my $line = <$fh>) {
    chomp($line);
    if ($line =~ /^>/) {
        if ($sequence ne "") {
            # Split sequence into codons
            my @codon = $sequence =~ /.{1,3}/g;
            print join(" ", @codon), "\n";
            print $id, "\n";

            # Check for stop codons and replace them with "NNN"
            foreach my $codon (@codon) {
                if ($codon =~ /^(TAG|TGA|TAA)/) {
                    $codon = "NNN";
                }
            }
        }
        $sequence = "";
        $id = $line;
    } else {
        $sequence .= $line;
    }
}

# Print last sequence
if ($sequence) {
    my @codon = $sequence =~ /.{1,3}/g;
    print join(" ", @codon), "\n";
    print $id, "\n";

}

close($fh) or die "Unable to close file";

Which should take input from command line, i.e fasta sequence, and process it: Split the sequence in multiples of three, replace stop codon with NNN.

I have the input sequence like:

>header 
ATGGACCAGCAGCAGCAGCAGCAGTAA

I was expecting some thing like:

>header 
ATGGACCAGCAGCAGCAGCAGCAGNNN

Also it did not process the last sequence in the file, and I got output as:

>header
ATG GAC CAG CAG CAG CAG CAG CAG TAA

Except header of the first sequence and sequence of the last header were missing.

like image 901
somilsharma Avatar asked Oct 23 '25 20:10

somilsharma


1 Answers

The substitution did not occur because the logic of your program is incorrect. The following condition is never true, so your replacement code does not get executed:

if ($sequence ne "")

Then, in the # Print last sequence code, you don't try to do the substitution.

Here is a self-contained example that does the substitution:

use warnings;
use strict;

while (my $line = <DATA>) {
    chomp($line);
    if ($line =~ /^>/) {
        print "$line\n";
    } else {
        # Split sequence into codons
        my @codon = $line =~ /.{1,3}/g;

        # Check for stop codons and replace them with "NNN"
        foreach my $codon (@codon) {
            if ($codon =~ /^(TAG|TGA|TAA)/) {
                $codon = "NNN";
            }
        }
        print join(" ", @codon), "\n";
    }
}

__DATA__
>header 
ATGGACCAGCAGCAGCAGCAGCAGTAA

Output:

>header 
ATG GAC CAG CAG CAG CAG CAG CAG NNN

See also: bioperl

like image 183
toolic Avatar answered Oct 25 '25 14:10

toolic