In python this code, where I directly call the function SeqIO.parse() , runs fine:
from Bio import SeqIO
a = SeqIO.parse("a.fasta", "fasta")
records = list(a)
for asq in SeqIO.parse("a.fasta", "fasta"):
print("Q")
But this, where I first store the output of SeqIO.parse() in a variable(?) called a, and then try to use it in my loop, it doesn't run:
from Bio import SeqIO
a = SeqIO.parse("a.fasta", "fasta")
records = list(a)
for asq in a:
print("Q")
Is this because a the output from the function || SeqIO.parse("a.fasta", "fasta") || is being stored in 'a' differently from when I directly call it? What exactly is the identity of 'a' here. Is it a variable? Is it an object? What does the function actually return?
SeqIO.parse() returns a normal python generator. This part of the Biopython module is written in pure python:
>>> from Bio import SeqIO
>>> a = SeqIO.parse("a.fasta", "fasta")
>>> type(a)
<class 'generator'>
Once a generator is iterated over it is exhausted as you discovered. You can't rewind a generator but you can store the contents in a list or dict if you don't mind putting it all in memory (useful if you need random access). You can use SeqIO.to_dict(a) to store in a dictionary with the record ids as the keys and sequences as the values. Simply re-building the generator calling SeqIO.parse() again will avoid dumping the file contents into memory of course.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With