Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Biopython perform Seq.find() accounting for ambiguity codes

I want to be able to search a Seq object for a subsequnce Seq object accounting for ambiguity codes. For example, the following should be true:

from Bio.Seq import Seq
from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA

amb = IUPACAmbiguousDNA()
s1 = Seq("GGAAAAGG", amb)
s2 = Seq("ARAA", amb)     # R = A or G
print s1.find(s2)

If ambiguity codes were taken into account, the answer should be

>>> 2

But the answer i get is that no match is found, or

>>> -1

Looking at the biopython source code, it doesnt appear that ambiguity codes are taken into account, as the subseqeunce is converted to a string using the private _get_seq_str_and_check_alphabet method, then the built in string method find() is used. Of course if this is the case, the "R" ambiguity code will be taken as a literal "R", not an A or G.

I could figure out how to do this with a home made method, but it seems like something that should be taken care of in the biopython packages using its Seq objects. Is there something I am missing here.

Is there a way to search for sub sequence membership accounting for ambiguity codes?

like image 424
Malonge Avatar asked Aug 24 '15 22:08

Malonge


People also ask

What can Biopython do?

Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical ...

What is SEQ module and its parameter?

The Seq object provides a number of string like methods (such as count, find, split and strip), which are alphabet aware where appropriate. In addition to the string like sequence, the Seq object has an alphabet property. This is an instance of an Alphabet class from Bio.

What is reverse complement in Biopython?

Complement and Reverse Complement: Biopython provides the complement() and reverse_complement() functions which can be used to find the complement of the given nucleotide sequence to get a new sequence, while the complemented sequence can also be reverse complemented to get the original sequence.

What is bio seq?

Seq module. Provide objects to represent biological sequences with alphabets. See also the Seq wiki and the chapter in our tutorial: HTML Tutorial.


1 Answers

From what I can read from the documentation for Seq.find here:

http://biopython.org/DIST/docs/api/Bio.Seq.Seq-class.html#find

It appears that this method works similar to the str.find method in that it looks for exact match. So, while the dna sequence can contain ambiguity codes, the Seq.find() method will only return a match when the exact subsequence matches.

To do what you want maybe the ntsearch function will work:

Search for motifs with degenerate positions

like image 188
Vince Avatar answered Oct 13 '22 14:10

Vince