BioPython: How to convert the amino acid alphabet to

Question

When discussing how to import sequence data using Bio.SeqIO.parse(), the BioPython cookbook states that:

There is an optional argument alphabet to specify the alphabet to be used. This is useful for file formats like FASTA where otherwise Bio.SeqIO will default to a generic alphabet.

How do I add this optional argument? I have the following code:

from os.path import abspath
from Bio import SeqIO

handle = open(f_path, "rU")
records = list(SeqIO.parse(handle, "fasta"))
handle.close()

This imports large list of FASTA files from a UniProt database. The problem is that it is in the generic SingleLetterAlphabet class. How do I convert between SingleLetterAlphabet to ExtendedIUPACProtein?

The ultimate goal is to search through these sequences for a motif such as GxxxG.

zero323 · Accepted Answer

Like this:

# Import required alphabet
from Bio.Alphabet import IUPAC

# Pass imported alphabet as an argument for `SeqIO.parse`:
records = list(SeqIO.parse(handle, 'fasta', IUPAC.extended_protein))

BioPython: How to convert the amino acid alphabet to

Tags:

python

bioinformatics

biopython

Kevin

1 Answers

zero323

Recent Activity

Donate For Us

BioPython: How to convert the amino acid alphabet to

Tags:

python

bioinformatics

biopython

Kevin

1 Answers

zero323

Related questions

Recent Activity

Donate For Us