When discussing how to import sequence data using Bio.SeqIO.parse(), the BioPython cookbook states that:
There is an optional argument alphabet to specify the alphabet to be used. This is useful for file formats like FASTA where otherwise Bio.SeqIO will default to a generic alphabet.
How do I add this optional argument? I have the following code:
from os.path import abspath
from Bio import SeqIO
handle = open(f_path, "rU")
records = list(SeqIO.parse(handle, "fasta"))
handle.close()
This imports large list of FASTA files from a UniProt database. The problem is that it is in the generic SingleLetterAlphabet class. How do I convert between SingleLetterAlphabet to ExtendedIUPACProtein?
The ultimate goal is to search through these sequences for a motif such as GxxxG.
Like this:
# Import required alphabet
from Bio.Alphabet import IUPAC
# Pass imported alphabet as an argument for `SeqIO.parse`:
records = list(SeqIO.parse(handle, 'fasta', IUPAC.extended_protein))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With