Is it possible for str.encode(encoding='utf-8', errors='strict') to raise UnicodeError?

Question

I am writing some code that needs to work with both Py2.7 and Py3.7+.

I need to write text to a file using UTF-8 encoding. My code looks like this:

import six
...
content = ...
if isinstance(content, six.string_types):
    content = content.encode(encoding='utf-8', errors='strict')

# write 'content' to file

Above, is it possible for content.encode() to raise UnicodeError from either Py2.7 or Py3.7+? I cannot think of a scenario where this is possible. I am not a Python expert, so I think there there must be an edge case.

Here is my reasoning why I think it will never raise UnicodeError:

six.string_types covers three types: Py2.7 str & unicode, Py3.7+ str
All of these types can always encode as UTF-8.

Kelly Bundy · Accepted Answer

Yes, it's possible:

import six

content = ''.join(map(chr, range(0x110000)))
if isinstance(content, six.string_types):
    content = content.encode(encoding='utf-8', errors='strict')

Result (Try it online!, using Python 3.7.4):

Traceback (most recent call last):
  File ".code.tio", line 5, in <module>
    content = content.encode(encoding='utf-8', errors='strict')
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 55296-57343: surrogates not allowed

And UnicodeEncodeErrors are UnicodeErrors.

Is it possible for str.encode(encoding='utf-8', errors='strict') to raise UnicodeError?

Tags:

python

python-3.x

python-unicode

python-2.7

kevinarpe

1 Answers

Kelly Bundy

Recent Activity

Donate For Us

Is it possible for str.encode(encoding='utf-8', errors='strict') to raise UnicodeError?

Tags:

python

python-3.x

python-unicode

python-2.7

kevinarpe

1 Answers

Kelly Bundy

Related questions

Recent Activity

Donate For Us