Does python 3 have a structure for making a filtering stream? In particular, my goal here is to calculate an md5 checksum of the contents read from a REST service with requests without making an extra copy. If I could subclass some sort of filter stream and just shove the bytes into a hashlib-derived md5 object I'd be good.
Currently, my code includes:
shutil.copyfileobj(r.raw, outstream)
where 'r' is the response object. Can I wrap a generator or some such thing around r.raw that will be called with each buffer of data as read, so that I can then pass it into md5?
requests supports reading URL data in chunks, and the hashlib library lets you calculate a MD5 in chunks, so you have everything you need right there already. You can choose between .iter_lines() or .iter_content():
import requests
import hashlib
r = requests.get(url, stream=True)
sig = hashlib.md5()
for line in r.iter_lines():
sig.update(line)
print(sig.hexdigest())
If you have to view it as a filter, use a generator:
class MD5TransparentFilter:
def __init__(self, source):
self._sig = hashlib.md5()
self._source = source
def __iter__(self):
for line in self._source:
self._sig.update(line)
yield line
def hexdigest(self):
return self._sig.hexdigest()
then use that on your .iter_lines() or .iter_content() iterator:
r = requests.get(url, stream=True)
filtered = MD5TransparentFilter(r.iter_content(1000))
for line in filtered:
# do something with the line
print(filtered.hexdigest())
For shutil.copyfileobj() you'd need to implement a .read() interface instead of .__iter__(), but the principles are the same:
class MD5TransparentFile:
def __init__(self, source):
self._sig = hashlib.md5()
self._source = source
def read(self, buffer):
# we ignore the buffer size, just use the `.next()` value in the source iterator
try:
line = self._source.next()
self._sig.update(line)
return line
except StopIteration:
return b''
def hexdigest(self):
return self._sig.hexdigest()
The MD5TransparentFile() class takes your .iter_content() or .iter_lines() iterator, and return data from that on each call to .read(), as well as calculate the MD5 on the fly. This can be used directly for your shutil.copyfileobj() example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With