Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Catch UnicodeDecodeError exception while reading file line by line in Python 3

Consider the following code:

with open('file.txt', 'r') as f:
    for line in f:
        print(line)

In Python 3, the interpreter tries to decode the strings it reads, which might lead to exceptions like UnicodeDecodeError. These can of course be caught with a try ... except block around the whole loop, but I would like to handle them on a per-line basis.

Question: Is there a way to directly catch and handle exceptions for each line that is read? Hopefully without changing the simple syntax of iterating over the file too much?

like image 351
piripiri Avatar asked Nov 23 '17 10:11

piripiri


2 Answers

The Pythonic way is probably to register an error handler with codecs.register_error_handler('special', handler) and declare it in the open function:

with open('file.txt', 'r', error='special') as f:
    ...

That way if there is an offending line, the handler will the called with the UnicodeDecodeError, and will be able to return a replacement string or re-raise the error.

If you want a more evident processing, an alternate way would be to open the file in binary mode and explicitely decode each line:

with open('file.txt', 'rb') as f:
    for bline in f:
        try:
            line = bline.decode()
            print(line)
        except UnicodeDecodeError as e:
            # process error
like image 52
Serge Ballesta Avatar answered Nov 11 '22 00:11

Serge Ballesta


Instead of employing a for loop, you could call next on the file-iterator yourself and catch the StopIteration manually.

with open('file.txt', 'r') as f:
    while True:
        try:
            line = next(f)
            # code
        except StopIteration:
            break
        except UnicodeDecodeError:
            # code
like image 7
timgeb Avatar answered Nov 10 '22 23:11

timgeb