import gzip
import io
from Bio import SeqIO
infile = "myinfile.fastq.gz"
fileout = open("myoutfile.fastq", "w+")
with io.TextIOWrapper(gzip.open(infile, "r")) as f:
line = f.read()
fileout.write(line)
fileout.seek(0)
count = 0
for rec in SeqIO.parse(fileout, "fastq"): #parsing from file
count += 1
print("%i reads" % count)
The above works when "line" is written to a file and that file is feed to the parser, but below does not work. Why can't line be read directly? Is there a way to feed "line" straight to the parser without having to write to a file first?
infile = "myinfile.fastq.gz"
#fileout = "myoutfile.fastq"
with io.TextIOWrapper(gzip.open(infile, "r")) as f:
line = f.read()
#myout.write(line)
count = 0
for rec in SeqIO.parse(line, "fastq"): #line used instead of writing from file
count += 1
print("%i reads" % count)
It's because SeqIO.parse
only accepts a file handler or a filename as the first parameter.
If you want to read a gzipped file directly into SeqIO.parse
just pass a handler to it:
import gzip
from Bio import SeqIO
count = 0
with gzip.open("myinfile.fastq.gz") as f:
for rec in SeqIO.parse(f, "fastq"):
count += 1
print("{} reads".format(count))
Just to add to the other answer, if your input sequence is being read from something other than a file (i.e. a web query), then you can use io.StringIO
to simulate a file-like object. A StringIO object behaves like a file-handle, but reads/writes from a memory buffer. The input to StringIO()
should be a string, not another file or filehandle.
from io import StringIO
infile = "myinfile.fastq.gz"
with io.TextIOWrapper(gzip.open(infile, "r")) as f:
line = f.read()
fastq_io = StringIO(line)
records = SeqIO.parse(fastq_io, "fastq")
fastq_io.close()
#Do something to sequence records here
It is worth noting that a StringIO
object needs to be closed to free up the memory space, so if you're using a lot of them then you will run into issues if you don't .close()
them. With this in mind, it is probably best practice to use them within a with ... as ...:
block:
with StringIO(line) as fastq_io:
records = SeqIO.parse(fastq_io, "fastq")
#Do something to sequence records here
I've used this technique a fair bit when getting sequence data from web services, and don't want to write to a temporary file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With