Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BioPython: How to convert the amino acid alphabet to

When discussing how to import sequence data using Bio.SeqIO.parse(), the BioPython cookbook states that:

There is an optional argument alphabet to specify the alphabet to be used. This is useful for file formats like FASTA where otherwise Bio.SeqIO will default to a generic alphabet.

How do I add this optional argument? I have the following code:

from os.path import abspath
from Bio import SeqIO

handle = open(f_path, "rU")
records = list(SeqIO.parse(handle, "fasta"))
handle.close()

This imports large list of FASTA files from a UniProt database. The problem is that it is in the generic SingleLetterAlphabet class. How do I convert between SingleLetterAlphabet to ExtendedIUPACProtein?

The ultimate goal is to search through these sequences for a motif such as GxxxG.

like image 764
Kevin Avatar asked Mar 21 '23 22:03

Kevin


1 Answers

Like this:

# Import required alphabet
from Bio.Alphabet import IUPAC

# Pass imported alphabet as an argument for `SeqIO.parse`:
records = list(SeqIO.parse(handle, 'fasta', IUPAC.extended_protein))
like image 121
zero323 Avatar answered Apr 01 '23 15:04

zero323