Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find multiple motifs(substring) in a protein sequence(string)?

The following script is for finding one motif in protein sequence.

use strict;
use warnings;

my @file_data=();
my $protein_seq='';
my $h= '[VLIM]';   
my $s= '[AG]';
my $x= '[ARNDCEQGHILKMFPSTWYV]';
my $regexp = "($h){4}D($x){4}D"; #motif to be searched is hhhhDxxxxD
my @locations=();

@file_data= get_file_data("seq.txt");

$protein_seq= extract_sequence(@file_data); 

#searching for a motif hhhhDxxxxD in each protein sequence in the give file

foreach my $line(@file_data){
    if ($motif=~ /$regexp/){
        print "found motif \n\n";
      } else {
        print "not found \n\n";
    }
}
#recording the location/position of motif to be outputed

@locations= match_position($regexp,$seq);
if (@locations){ 
    print "Searching for motifs $regexp \n";
    print "Catalytic site is at location:\n";
  } else {
    print "motif not found \n\n";
}
exit;

sub get_file_data{
    my ($filename)=@_;
    use strict;
    use warnings;
    my $sequence='';

    foreach my $line(@fasta_file_data){
        if ($line=~ /^\s*(#.*)?|^>/{
            next;
          } 
        else {
            $sequence.=$line;
        }
    }
    $sequence=~ s/\s//g;
    return $sequence;
}

sub(match_positions) {
    my ($regexp, $sequence)=@_;
    use strict;
    my @position=();
    while ($sequence=~ /$regexp/ig){
        push (@position, $-[0]);
    }
    return @position;
}

I am not sure how to extend this for finding multiple motifs (in a fixed order i.e motif1, motif2, motif3) in a given file containing a protein sequence.

like image 434
shubster Avatar asked May 06 '09 23:05

shubster


People also ask

How do you identify motifs?

A motif is a recurring narrative element with symbolic significance. If you spot a symbol, concept, or plot structure that surfaces repeatedly in the text, you're probably dealing with a motif. They must be related to the central idea of the work, and they always end up reinforcing the author's overall message.

What is motif finding in bioinformatics?

Motif discovery is one of the sequence analysis problems under the application layer and it is one of the significant difficulties in bioinformatics applications. A DNA sequence motif is a subsequence of DNA sequence that is a short similar recurring pattern of nucleotides, and it has many biological functions 1.

What is a motif in DNA sequence?

Sequence motifs are short, recurring patterns in DNA that are presumed to have a biological function. Often they indicate sequence-specific binding sites for proteins such as nucleases and transcription factors (TF).


1 Answers

You could simply use alternations (delimited by |) of the sequences. That way each sequence the regex engine can match it will.

/($h{4}D$x{4}D|$x{1,4}A{1,2}$s{2})/

Then you can test this match by looking at $1.

like image 74
Axeman Avatar answered Oct 05 '22 11:10

Axeman