Converting octet strings to Unicode strings, Python 3

Question

I'm trying to convert a string with octal-escaped Unicode back into a proper Unicode string as follows, using Python 3:

"training\345\256\214\346\210\220\345\276\214.txt" is the read-in string.

"training完成後.txt" is the string's actual representation, which I'm trying to obtain.

However, after skimming SO, seems the suggested solution was the following most everywhere I could find for Python 3:

decoded_string = bytes(myString, "utf-8").decode("unicode_escape")

Unfortunately, that seems to yield the wrong Unicode string when applied to my sample:

'trainingÃ¥Â®Â\x8cÃ¦Â\x88Â\x90Ã¥Â¾Â\x8c.txt'

This seems easy to do with byte literals, as well as in Python 2, but unfortunately doesn't seem as easy with strings in Python 3. Help much appreciated, thanks! :)

Mark Tolonen · Accepted Answer

Assuming your starting string is a Unicode string with literal backslashes, you first need a byte string to use the unicode-escape codec, but the octal escapes are UTF-8, so you'll need to convert it again to a byte string and then decode as UTF-8:

>>> s = r'training\345\256\214\346\210\220\345\276\214.txt'
>>> s
'training\345\256\214\346\210\220\345\276\214.txt'
>>> s.encode('latin1')
b'training\345\256\214\346\210\220\345\276\214.txt'
>>> s.encode('latin1').decode('unicode-escape')
'trainingå®\x8cæ\x88\x90å¾\x8c.txt'
>>> s.encode('latin1').decode('unicode-escape').encode('latin1')
b'training\xe5\xae\x8c\xe6\x88\x90\xe5\xbe\x8c.txt'
>>> s.encode('latin1').decode('unicode-escape').encode('latin1').decode('utf8')
'training完成後.txt'

Note that the latin1 codec does a direct translation of Unicode codepoints U+0000 to U+00FF to bytes 00-FF.

Converting octet strings to Unicode strings, Python 3

Tags:

string

python-3.x

unicode

octal

coltonoscopy

1 Answers

Mark Tolonen

Recent Activity

Donate For Us

Converting octet strings to Unicode strings, Python 3

Tags:

string

python-3.x

unicode

octal

coltonoscopy

1 Answers

Mark Tolonen

Related questions

Recent Activity

Donate For Us