Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible for str.encode(encoding='utf-8', errors='strict') to raise UnicodeError?

I am writing some code that needs to work with both Py2.7 and Py3.7+.

I need to write text to a file using UTF-8 encoding. My code looks like this:

import six
...
content = ...
if isinstance(content, six.string_types):
    content = content.encode(encoding='utf-8', errors='strict')

# write 'content' to file

Above, is it possible for content.encode() to raise UnicodeError from either Py2.7 or Py3.7+? I cannot think of a scenario where this is possible. I am not a Python expert, so I think there there must be an edge case.

Here is my reasoning why I think it will never raise UnicodeError:

  • six.string_types covers three types: Py2.7 str & unicode, Py3.7+ str
  • All of these types can always encode as UTF-8.
like image 787
kevinarpe Avatar asked Feb 24 '26 21:02

kevinarpe


1 Answers

Yes, it's possible:

import six

content = ''.join(map(chr, range(0x110000)))
if isinstance(content, six.string_types):
    content = content.encode(encoding='utf-8', errors='strict')

Result (Try it online!, using Python 3.7.4):

Traceback (most recent call last):
  File ".code.tio", line 5, in <module>
    content = content.encode(encoding='utf-8', errors='strict')
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 55296-57343: surrogates not allowed

And UnicodeEncodeErrors are UnicodeErrors.

like image 91
Kelly Bundy Avatar answered Feb 26 '26 12:02

Kelly Bundy