Is there a way to convert a \x escaped string like "\\xe8\\xaa\\x9e\\xe8\\xa8\\x80" into readable form: "語言"?
>>> a = "\\xe8\\xaa\\x9e\\xe8\\xa8\\x80"
>>> print(a)
\xe8\xaa\x9e\xe8\xa8\x80
I am aware that there is a similar question here, but it seems the solution is only for latin characters. How can I convert this form of string into readable CJK characters?
Decode it first using 'unicode-escape', then as 'utf8':
a = "\\xe8\\xaa\\x9e\\xe8\\xa8\\x80"
decoded = a.encode('latin1').decode('unicode_escape').encode('latin1').decode('utf8')
print(decoded)
# 語言
Note that since we can only decode bytes objects, we need to transparently encode it in between, using 'latin1'.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With