I have a string of the form:
s = '\\xe2\\x99\\xac'
I would like to convert this to the character ♬ by evaluating the escape sequence. However, everything I've tried either results in an error or prints out garbage. How can I force Python to convert the escape sequence into a literal unicode character?
What I've read elsewhere suggests that the following line of code should do what I want, but it results in a UnicodeEncodeError.
print(bytes(s, 'utf-8').decode('unicode-escape'))
I also tried the following, which has the same result:
import codecs
print(codecs.getdecoder('unicode_escape')(s)[0])
Both of these approaches produce the string 'â\x99¬', which print is subsequently unable to handle.
In case it makes any difference the string is being read in from a UTF-8 encoded file and will ultimately be output to a different UTF-8 encoded file after processing.
In Python strings, the backslash "\" is a special character, also called the "escape" character. It is used in representing certain whitespace characters: "\t" is a tab, "\n" is a newline, and "\r" is a carriage return. Conversely, prefixing a special character with "\" turns it into an ordinary character.
In Python strings, the backslash “ ” is a special character, also called the “escape” character. It is used in representing certain whitespace characters: “\t” is a tab, “\n” is a new line, and “\r” is a carriage return. Finally, “ ” can be used to escape itself: “\” is the literal backslash character.
An escape sequence is a sequence of characters that does not represent itself when used inside a character or string literal, but is translated into another character or a sequence of characters that may be difficult or impossible to represent directly.
...decode('unicode-escape')
will give you string '\xe2\x99\xac'
.
>>> s = '\\xe2\\x99\\xac'
>>> s.encode().decode('unicode-escape')
'â\x99¬'
>>> _ == '\xe2\x99\xac'
True
You need to decode it. But to decode it, encode it first with latin1
(or iso-8859-1
) to preserve the bytes.
>>> s = '\\xe2\\x99\\xac'
>>> s.encode().decode('unicode-escape').encode('latin1').decode('utf-8')
'♬'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With