When I loop over the lines of a set of gzipped files with the module fileinput like this:
for line in fileinput.FileInput(files=gzipped_files,openhook=fileinput.hook_compressed):
Then those lines are byte strings and not text strings.
When using the module gzip this can be prevented by opening the files with 'rt' instead of 'rb': http://bugs.python.org/issue13989
Is there a similar fix for the module fileinput, so I can have it return text strings instead of byte strings? I tried adding mode='rt', but then I get this error:
ValueError: FileInput opening mode must be one of 'r', 'rU', 'U' and 'rb'
You'd have to implement your own openhook
function to open the files with a codec:
import os
def hook_compressed_text(filename, mode, encoding='utf8'):
ext = os.path.splitext(filename)[1]
if ext == '.gz':
import gzip
return gzip.open(filename, mode + 't', encoding=encoding)
elif ext == '.bz2':
import bz2
return bz2.open(filename, mode + 't', encoding=encoding)
else:
return open(filename, mode, encoding=encoding)
Coming a bit late to the party, but wouldn't it be simpler to do this?
for line in fileinput.FileInput(files=gzipped_files, openhook=fileinput.hook_compressed):
if isinstance(line, bytes):
line = line.decode()
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With