SeqIO.parse on a fasta.gz

Question

New to coding. New to Pytho/biopython; this is my first question online, ever. How do I open a compressed fasta.gz file to extract info and perform calcuations in my function. Here is a simplified example of what I'm trying to do (I've tried different ways), and what the error is. The gzip command I'm using doesn't seem to work.?

with gzip.open("practicezip.fasta.gz", "r") as handle:
    for record in SeqIO.parse(handle, "fasta"):
        print(record.id)

Traceback (most recent call last):

  File "<ipython-input-192-a94ad3309a16>", line 2, in <module>
    for record in SeqIO.parse(handle, "fasta"):

  File "C:\Users\Anaconda3\lib\site-packages\Bio\SeqIO\__init__.py", line 600, in parse
    for r in i:

  File "C:\Users\Anaconda3\lib\site-packages\Bio\SeqIO\FastaIO.py", line 122, in FastaIterator
    for title, sequence in SimpleFastaParser(handle):

  File "C:\Users\Anaconda3\lib\site-packages\Bio\SeqIO\FastaIO.py", line 46, in SimpleFastaParser
    if line[0] == ">":

IndexError: index out of range

klim · Accepted Answer

Are you using python3?

This ("r" --> "rt") could solve your problem.

import gzip
from Bio import SeqIO

with gzip.open("practicezip.fasta.gz", "rt") as handle:
    for record in SeqIO.parse(handle, "fasta"):
        print(record.id)

David Streuli · Answer

@klim's answer is good. However, in some cases you dont want to iterate but just select a single entry. In such cases, use following code:

import pyfastx
fa = pyfastx.Fasta('ATEST.fasta.gz')
s1 = fa['KF530110.1']
fa_sequence = s1.seq

It creates an additional file, namely it indexes each fasta entry. It's really fast.

SeqIO.parse on a fasta.gz

Tags:

python

gzip

bioinformatics

biopython

MelBel88

2 Answers

klim

David Streuli

Recent Activity

Donate For Us

SeqIO.parse on a fasta.gz

Tags:

python

gzip

bioinformatics

biopython

MelBel88

2 Answers

klim

David Streuli

Related questions

Recent Activity

Donate For Us