Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string.decode custom errors argument

Tags:

python

I have this Python 2.7 code:

# coding: utf-8
#
f = open('data.txt', 'r')

for line in f:
  line = line.decode(encoding='utf-8', errors='foo23')
  print len(line)

f.close()

How come Python won't issue an error since the only valid/registered codecs for errors are:

  • strict
  • ignore
  • replace
  • xmlcharrefreplace
  • backslashreplace

The documentation says that you can register your own, but I did not register 'foo23', and the Python code still runs without an error/warning. If I change the encoding argument it raises an error, but if I change errors to a custom string everything is ok.

line = line.decode(encoding='utf-9', errors='foo23')

 File "parse.py", line 7, in <module>
line = line.decode(encoding='utf-9', errors='foo23')
LookupError: unknown encoding: utf-9
like image 820
broadband Avatar asked Jan 31 '13 09:01

broadband


2 Answers

If there is no error during decoding; the errors parameter is not used and its value doesn't matter as long as it is a string:

>>> b'\x09'.decode('utf-8', errors='abc')
u'\t'

If bytes can't by decoded using the given encoding then the error handler is used and you get an error if you specify non-existing error handler:

>>> b'\xff'.decode('utf-8', errors='abc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "../lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
LookupError: unknown error handler name 'abc'
like image 129
jfs Avatar answered Oct 26 '22 21:10

jfs


The errors keywork argument is for you to tell the str.decode() function how you want errors handled, it won't raise any all by itself. The reason you're getting an error on your second example is because you've passed an invalid argument for encoding to the function, and for no other reason.

like image 28
cms_mgr Avatar answered Oct 26 '22 21:10

cms_mgr