Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python gzipped fileinput returns binary string instead of text string

When I loop over the lines of a set of gzipped files with the module fileinput like this:

for line in fileinput.FileInput(files=gzipped_files,openhook=fileinput.hook_compressed):

Then those lines are byte strings and not text strings.

When using the module gzip this can be prevented by opening the files with 'rt' instead of 'rb': http://bugs.python.org/issue13989

Is there a similar fix for the module fileinput, so I can have it return text strings instead of byte strings? I tried adding mode='rt', but then I get this error:

ValueError: FileInput opening mode must be one of 'r', 'rU', 'U' and 'rb'
like image 301
tommy.carstensen Avatar asked Feb 03 '14 13:02

tommy.carstensen


2 Answers

You'd have to implement your own openhook function to open the files with a codec:

import os

def hook_compressed_text(filename, mode, encoding='utf8'):
    ext = os.path.splitext(filename)[1]
    if ext == '.gz':
        import gzip
        return gzip.open(filename, mode + 't', encoding=encoding)
    elif ext == '.bz2':
        import bz2
        return bz2.open(filename, mode + 't', encoding=encoding)
    else:
        return open(filename, mode, encoding=encoding)
like image 147
Martijn Pieters Avatar answered Nov 16 '22 01:11

Martijn Pieters


Coming a bit late to the party, but wouldn't it be simpler to do this?

for line in fileinput.FileInput(files=gzipped_files, openhook=fileinput.hook_compressed):
    if isinstance(line, bytes):
        line = line.decode()
    ...
like image 33
Huw Walters Avatar answered Nov 16 '22 00:11

Huw Walters