I have a string of the form: <pre class="prettyprint"><code>s = '\\xe2\\x99\\xac' </code></pre> I would like to convert this to the character ♬ by evaluating the escape sequence. However, everything I've tried either results in an error or prints out garbage. How can I force Python to convert the escape sequence into a literal unicode character? What I've read elsewhere suggests that the following line of code should do what I want, but it results in a UnicodeEncodeError. <pre class="prettyprint"><code>print(bytes(s, 'utf-8').decode('unicode-escape')) </code></pre> I also tried the following, which has the same result: <pre class="prettyprint"><code>import codecs print(codecs.getdecoder('unicode_escape')(s)[0]) </code></pre> Both of these approaches produce the string 'â\x99¬', which print is subsequently unable to handle. In case it makes any difference the string is being read in from a UTF-8 encoded file and will ultimately be output to a different UTF-8 encoded file after processing.

<code>...decode('unicode-escape')</code> will give you string <code>'\xe2\x99\xac'</code>. <pre class="prettyprint"><code>>>> s = '\\xe2\\x99\\xac' >>> s.encode().decode('unicode-escape') 'â\x99¬' >>> _ == '\xe2\x99\xac' True </code></pre> You need to decode it. But to decode it, encode it first with <code>latin1</code> (or <code>iso-8859-1</code>) to preserve the bytes. <pre class="prettyprint"><code>>>> s = '\\xe2\\x99\\xac' >>> s.encode().decode('unicode-escape').encode('latin1').decode('utf-8') '♬' </code></pre>

Evaluate UTF-8 literal escape sequences in a string in Python3

Tags:

python

string

python-3.x

unicode

utf-8

I have a string of the form:

s = '\\xe2\\x99\\xac'

I would like to convert this to the character ♬ by evaluating the escape sequence. However, everything I've tried either results in an error or prints out garbage. How can I force Python to convert the escape sequence into a literal unicode character?

What I've read elsewhere suggests that the following line of code should do what I want, but it results in a UnicodeEncodeError.

print(bytes(s, 'utf-8').decode('unicode-escape'))

I also tried the following, which has the same result:

import codecs
print(codecs.getdecoder('unicode_escape')(s)[0])

Both of these approaches produce the string 'â\x99¬', which print is subsequently unable to handle.

In case it makes any difference the string is being read in from a UTF-8 encoded file and will ultimately be output to a different UTF-8 encoded file after processing.

830

asked Oct 11 '14 05:10

Altay_H

1 Answers

...decode('unicode-escape') will give you string '\xe2\x99\xac'.

>>> s = '\\xe2\\x99\\xac'
>>> s.encode().decode('unicode-escape')
'â\x99¬'
>>> _ == '\xe2\x99\xac'
True

You need to decode it. But to decode it, encode it first with latin1 (or iso-8859-1) to preserve the bytes.

>>> s = '\\xe2\\x99\\xac'
>>> s.encode().decode('unicode-escape').encode('latin1').decode('utf-8')
'♬'

197

answered Sep 30 '22 15:09

falsetru

Related questions
                            
                                What's the easiest way to get Python's `defaultdict` behavior in C++?
                            
                                Django admin foreign key dropdown with custom value
                            
                                Python sas7bdat module usage
                            
                                Should namedtuples follow constant name conventions in python?
                            
                                SQLAlchemy query, join on relationship and order by count
                            
                                how to skip a unittest case in python 2.6
                            
                                How to run test suite in python setup.py
                            
                                Why sqlalchemy add \ to " for a perfect JSON string to postgresql json field?
                            
                                Hindi to English Transliteration [closed]
                            
                                PyCharm & Pyenv local?
                            
                                How does python function return objects?
                            
                                What's the difference between getattr(self, '__a') and self.__a in python?
                            
                                Is there a way to create a .ipynb from a .py file command line?
                            
                                Why calling .sort() function on Pandas Series sorts its values in-place and returns nothing? [duplicate]
                            
                                Flask flash and url_for with AJAX
                            
                                How to print rows if values appear in any column of pandas dataframe
                            
                                Apache Spark: Job aborted due to stage failure: "TID x failed for unknown reasons"
                            
                                How do you tell if a context manager is reusable or reentrant?
                            
                                How to patch classmethod with autospec in unmocked class?
                            
                                Does Python's subprocess.Popen accept spaces in paths?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With