Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I treat an ASCII string as unicode and unescape the escaped characters in it in python?

For example, if I have a unicode string, I can encode it as an ASCII string like so:

>>> u'\u003cfoo/\u003e'.encode('ascii') '<foo/>' 

However, I have e.g. this ASCII string:

'\u003foo\u003e' 

... that I want to turn into the same ASCII string as in my first example above:

'<foo/>' 
like image 351
John Avatar asked Nov 06 '08 01:11

John


People also ask

How do you escape a Unicode character in Python?

u"\U0001F300"==u"\ud83c\udf00" is a shortcoming of 'narrow' builds of Python 2 and not generally something you'd want to rely on. If you mean to use the emoji character, always use the \U form.

What is Unicode escape encoding?

A unicode escape sequence is a backslash followed by the letter 'u' followed by four hexadecimal digits (0-9a-fA-F). It matches a character in the target sequence with the value specified by the four digits. For example, ”\u0041“ matches the target sequence ”A“ when the ASCII character encoding is used.

How do you create a Unicode string in Python?

You have two options to create Unicode string in Python. Either use decode() , or create a new Unicode string with UTF-8 encoding by unicode(). The unicode() method is unicode(string[, encoding, errors]) , its arguments should be 8-bit strings.

What is encoding Unicode escape Python?

In Python source code, Unicode literals are written as strings prefixed with the 'u' or 'U' character: u'abcdefghijk'. Specific code points can be written using the \u escape sequence, which is followed by four hex digits giving the code point. The \U escape sequence is similar, but expects 8 hex digits, not 4.


1 Answers

It took me a while to figure this one out, but this page had the best answer:

>>> s = '\u003cfoo/\u003e' >>> s.decode( 'unicode-escape' ) u'<foo/>' >>> s.decode( 'unicode-escape' ).encode( 'ascii' ) '<foo/>' 

There's also a 'raw-unicode-escape' codec to handle the other way to specify Unicode strings -- check the "Unicode Constructors" section of the linked page for more details (since I'm not that Unicode-saavy).

EDIT: See also Python Standard Encodings.

like image 168
hark Avatar answered Oct 07 '22 01:10

hark