Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to encode Python 3 string using \u escape code?

In Python 3, suppose I have

>>> thai_string = 'สีเ'

Using encode gives

>>> thai_string.encode('utf-8')
b'\xe0\xb8\xaa\xe0\xb8\xb5'

My question: how can I get encode() to return a bytes sequence using \u instead of \x? And how can I decode them back to a Python 3 str type?

I tried using the ascii builtin, which gives

>>> ascii(thai_string)
"'\\u0e2a\\u0e35'"

But this doesn't seem quite right, as I can't decode it back to obtain thai_string.

Python documentation tells me that

  • \xhh escapes the character with the hex value hh while
  • \uxxxx escapes the character with the 16-bit hex value xxxx

The documentation says that \u is only used in string literals, but I'm not sure what that means. Is this a hint that my question has a flawed premise?

like image 807
Michael Currie Avatar asked Aug 28 '15 22:08

Michael Currie


People also ask

How do you escape an escape character in Python?

To do this, simply add a backslash ( \ ) before the character you want to escape.

How do you escape in Python 3?

In Python strings, the backslash “ ” is a special character, also called the “escape” character. It is used in representing certain whitespace characters: “\t” is a tab, “\n” is a new line, and “\r” is a carriage return. Finally, “ ” can be used to escape itself: “\” is the literal backslash character.

What is \u in a string?

The 'u' in front of a string means the string is a Unicode string. A Unicode is a way for a string to represent more characters than a regular ASCII string can.


1 Answers

You can use unicode_escape:

>>> thai_string.encode('unicode_escape')
b'\\u0e2a\\u0e35\\u0e40'

Note that encode() will always return a byte string (bytes) and the unicode_escape encoding is intended to:

Produce a string that is suitable as Unicode literal in Python source code

like image 80
Simeon Visser Avatar answered Oct 10 '22 02:10

Simeon Visser