Consider the following code:
with open('file.txt', 'r') as f:
for line in f:
print(line)
In Python 3, the interpreter tries to decode the strings it reads, which might lead to exceptions like UnicodeDecodeError
. These can of course be caught with a try ... except
block around the whole loop, but I would like to handle them on a per-line basis.
Question: Is there a way to directly catch and handle exceptions for each line that is read? Hopefully without changing the simple syntax of iterating over the file too much?
The Pythonic way is probably to register an error handler with codecs.register_error_handler('special', handler)
and declare it in the open function:
with open('file.txt', 'r', error='special') as f:
...
That way if there is an offending line, the handler
will the called with the UnicodeDecodeError
, and will be able to return a replacement string or re-raise the error.
If you want a more evident processing, an alternate way would be to open the file in binary mode and explicitely decode each line:
with open('file.txt', 'rb') as f:
for bline in f:
try:
line = bline.decode()
print(line)
except UnicodeDecodeError as e:
# process error
Instead of employing a for
loop, you could call next
on the file-iterator yourself and catch the StopIteration
manually.
with open('file.txt', 'r') as f:
while True:
try:
line = next(f)
# code
except StopIteration:
break
except UnicodeDecodeError:
# code
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With