Python: how to convert string with \unnnn escapes to Unicode string? [duplicate]

Question

I am using Python and unfortunately my code needs to convert a string that represents Unicode characters in the string as \u1234 escapes into the original string, like:

Here is the code string that I got from other code:

\u6b22\u8fce\u63d0\u4ea4\u5fae\u535a\u641c\u7d22\u4f7f\u7528\u53cd\u9988\uff0c\u8bf7\u76f4\u63a5

I need to convert it back to the original string. How to do that?

Mark Tolonen · Accepted Answer

I think this is what you want. It isn't UTF-8 byte string (well, technically it is, but only because ASCII is a subset of UTF-8).

>>> s='\u6b22\u8fce\u63d0\u4ea4\u5fae\u535a\u641c\u7d22\u4f7f\u7528\u53cd\u9988\uff0c\u8bf7\u76f4\u63a5'
>>> print s.decode('unicode-escape')
欢迎提交微博搜索使用反馈，请直接

FYI, this is UTF-8:

>>> s.decode('unicode-escape').encode('utf8')

'\xe6\xac\xa2\xe8\xbf\x8e\xe6\x8f\x90\xe4\xba\xa4\xe5\xbe\xae\xe5\x8d\x9a\xe6\x90\x9c\xe7\xb4\xa2\xe4\xbd\xbf\xe7\x94\xa8\xe5\x8f\x8d\xe9\xa6\x88\xef\xbc\x8c\xe8\xaf\xb7\xe7\x9b\xb4\xe6\x8e\xa5'

Tisho · Answer

If I understand the question, we have a simple byte string, having Unicode escaping in it, or something like that:

a = '\u6b22\u8fce\u63d0\u4ea4\u5fae\u535a\u641c\u7d22\u4f7f\u7528\u53cd\u9988\uff0c\u8bf7\u76f4\u63a5'

In [122]: a
Out[122]: '\u6b22\u8fce\u63d0\u4ea4\u5fae\u535a\u641c\u7d22\u4f7f\u7528\u53cd\u9988\uff0c\u8bf7\u76f4\u63a5'

So we need to manually parse the unicode values from the string, using the Unicode code points:

\u6b22 => unichr(0x6b22) # 欢

or finally:

print "".join([unichr(int('0x'+a[i+2:i+6], 16)) for i in range(0, len(a), 6)])
欢迎提交微博搜索使用反馈，请直接

Python: how to convert string with \unnnn escapes to Unicode string? [duplicate]

Tags:

python

unicode

Bin Chen

2 Answers

Mark Tolonen

Tisho

Recent Activity

Donate For Us

Python: how to convert string with \unnnn escapes to Unicode string? [duplicate]

Tags:

python

unicode

Bin Chen

2 Answers

Mark Tolonen

Tisho

Related questions

Recent Activity

Donate For Us