I have an http response from urllib
response = urllib2.urlopen('http://python.org/')
Eventually, I want to be able to seek()
within the response (at least to the beginning). So I want to be able to have code like this:
print result.readline()
result.seek(0)
print result.readline()
The simplest solution to this problem is StringIO
or io.BytesIO
like this:
result = io.BytesIO(response.read())
However, the thing is that the resources I want to request tend to be very large and I want to start working with them (parse...) before the whole download is finished. response.read()
is blocking. I'm looking for a non-blocking solution.
The ideal code would read(BUFFER_SIZE)
from the resource and whenever more content is needed, just request more from the response. I'm basically looking for a wrapper class that can do that. Oh, and I need a file like object.
I thought, I could write something like:
base = io.BufferedIOBase(response)
result = io.BufferedReader(base)
However, it turns out that this does not work and I have tried different classes from the io module but couldn't get it working. I'm happy with any wrapper class that has the desired behaviour.
I wrote my own wrapper class which preserves the first chunk of data. This way I can seek back to the beginning, analyze the encoding, file type and other things. This class solves the problem for me and should be simple enough to adapt to other use cases.
class BufferedFile(object):
''' A buffered file that preserves the beginning of a stream up to buffer_size
'''
def __init__(self, fp, buffer_size=1024):
self.data = cStringIO.StringIO()
self.fp = fp
self.offset = 0
self.len = 0
self.fp_offset = 0
self.buffer_size = buffer_size
@property
def _buffer_full(self):
return self.len >= self.buffer_size
def readline(self):
if self.len < self.offset < self.fp_offset:
raise BufferError('Line is not available anymore')
if self.offset >= self.len:
line = self.fp.readline()
self.fp_offset += len(line)
self.offset += len(line)
if not self._buffer_full:
self.data.write(line)
self.len += len(line)
else:
line = self.data.readline()
self.offset += len(line)
return line
def seek(self, offset):
if self.len < offset < self.fp_offset:
raise BufferError('Cannot seek because data is not buffered here')
self.offset = offset
if offset < self.len:
self.data.seek(offset)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With