Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do convert unicode escape sequences to unicode characters in a python string

When I tried to get the content of a tag using "unicode(head.contents[3])" i get the output similar to this: "Christensen Sk\xf6ld". I want the escape sequence to be returned as string. How to do it in python?

like image 918
Vicky Avatar asked Jun 13 '09 06:06

Vicky


People also ask

How do you change a Unicode to a string in Python?

To convert Python Unicode to string, use the unicodedata. normalize() function. The Unicode standard defines various normalization forms of a Unicode string, based on canonical equivalence and compatibility equivalence.

How do you escape a Unicode character in Python?

Unicode Literals in Python Source Code Specific code points can be written using the \u escape sequence, which is followed by four hex digits giving the code point. The \U escape sequence is similar, but expects 8 hex digits, not 4.

How do you write Unicode characters in Python?

To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2. x, you also need to prefix the string literal with 'u'.


1 Answers

Assuming Python sees the name as a normal string, you'll first have to decode it to unicode:

>>> name 'Christensen Sk\xf6ld' >>> unicode(name, 'latin-1') u'Christensen Sk\xf6ld' 

Another way of achieving this:

>>> name.decode('latin-1') u'Christensen Sk\xf6ld' 

Note the "u" in front of the string, signalling it is uncode. If you print this, the accented letter is shown properly:

>>> print name.decode('latin-1') Christensen Sköld 

BTW: when necessary, you can use de "encode" method to turn the unicode into e.g. a UTF-8 string:

>>> name.decode('latin-1').encode('utf-8') 'Christensen Sk\xc3\xb6ld' 
like image 133
Mark van Lent Avatar answered Sep 22 '22 02:09

Mark van Lent