In Python 3, suppose I have <pre class="prettyprint"><code>>>> thai_string = 'สีเ' </code></pre> Using <code>encode</code> gives <pre class="prettyprint"><code>>>> thai_string.encode('utf-8') b'\xe0\xb8\xaa\xe0\xb8\xb5' </code></pre> My question: how can I get <code>encode()</code> to return a <code>bytes</code> sequence using <code>\u</code> instead of <code>\x</code>? And how can I <code>decode</code> them back to a Python 3 <code>str</code> type? I tried using the <code>ascii</code> builtin, which gives <pre class="prettyprint"><code>>>> ascii(thai_string) "'\\u0e2a\\u0e35'" </code></pre> But this doesn't seem quite right, as I can't decode it back to obtain <code>thai_string</code>. Python documentation tells me that <ul> <li> <code>\xhh</code> escapes the character with the hex value <code>hh</code> while </li> <li> <code>\uxxxx</code> escapes the character with the 16-bit hex value <code>xxxx</code> </li> </ul> The documentation says that <code>\u</code> is only used in string literals, but I'm not sure what that means. Is this a hint that my question has a flawed premise?

You can use <code>unicode_escape</code>: <pre class="prettyprint"><code>>>> thai_string.encode('unicode_escape') b'\\u0e2a\\u0e35\\u0e40' </code></pre> Note that <code>encode()</code> will always return a byte string (bytes) and the <code>unicode_escape</code> encoding is intended to: <blockquote> Produce a string that is suitable as Unicode literal in Python source code </blockquote>

How to encode Python 3 string using \u escape code?

Tags:

python

python-3.x

unicode

unicode-escapes

In Python 3, suppose I have

Click to copy

>>> thai_string = 'สีเ'

Using encode gives

Click to copy

>>> thai_string.encode('utf-8')
b'\xe0\xb8\xaa\xe0\xb8\xb5'

My question: how can I get encode() to return a bytes sequence using \u instead of \x? And how can I decode them back to a Python 3 str type?

I tried using the ascii builtin, which gives

Click to copy

>>> ascii(thai_string)
"'\\u0e2a\\u0e35'"

But this doesn't seem quite right, as I can't decode it back to obtain thai_string.

Python documentation tells me that

\xhh escapes the character with the hex value hh while
\uxxxx escapes the character with the 16-bit hex value xxxx

The documentation says that \u is only used in string literals, but I'm not sure what that means. Is this a hint that my question has a flawed premise?

807

asked Aug 28 '15 22:08

Michael Currie

1 Answers

You can use unicode_escape:

Click to copy

>>> thai_string.encode('unicode_escape')
b'\\u0e2a\\u0e35\\u0e40'

Note that encode() will always return a byte string (bytes) and the unicode_escape encoding is intended to:

Produce a string that is suitable as Unicode literal in Python source code

answered Oct 10 '22 02:10

Simeon Visser

Related questions
                            
                                How to center a tkinter widget in a sticky frame
                            
                                Checking code for deprecation warnings
                            
                                pika, stop_consuming does not work
                            
                                django: value has an invalid date format. It must be in YYYY-MM-DD format
                            
                                Canonical way to do bulk create in django-rest-framework 3.x?
                            
                                How can you style Django's file picker form button?
                            
                                Ordered Logit in Python?
                            
                                abortable sleep() in Python
                            
                                Retrieve string version of document by ID in Gensim
                            
                                Python Key Error when setting environment variable in supervisord
                            
                                reverse on @list_route with custom url_path
                            
                                Python igraph: delete vertices from a graph
                            
                                flask running in mod_wsgi cannot write to /tmp
                            
                                does nolearn/lasagne support python 3
                            
                                how to store ipython magic output into variable
                            
                                Django .aggregate() on .annotate()
                            
                                Python, sharing mysql connection in multiple functions - pass connection or cursor?
                            
                                How to perform discrete optimization of functions over matrices?
                            
                                Why should I discard half of what a FFT returns?
                            
                                Annoying generator bug

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to encode Python 3 string using \u escape code?

Tags:

python

python-3.x

unicode

unicode-escapes

Michael Currie

People also ask

1 Answers

Simeon Visser

Recent Activity

Donate For Us