<blockquote> Possible Duplicate: How do I treat an ASCII string as unicode and unescape the escaped characters in it in python? How do convert unicode escape sequences to unicode characters in a python string </blockquote> I have a string that contains unicode characters e.g. <code>\u2026</code> etc. Somehow it is not received to me as <code>unicode</code>, but is received as a <code>str</code>. How do I convert it back to unicode? <pre class="prettyprint"><code>>>> a="Hello\u2026" >>> b=u"Hello\u2026" >>> print a Hello\u2026 >>> print b Hello… >>> print unicode(a) Hello\u2026 >>> </code></pre> So clearly <code>unicode(a)</code> is not the answer. Then what is?

Unicode escapes only work in unicode strings, so this <pre class="prettyprint"><code> a="\u2026" </code></pre> is actually a string of 6 characters: '\', 'u', '2', '0', '2', '6'. To make unicode out of this, use <code>decode('unicode-escape')</code>: <pre class="prettyprint"><code>a="\u2026" print repr(a) print repr(a.decode('unicode-escape')) ## '\\u2026' ## u'\u2026' </code></pre>

Decode it with the <code>unicode-escape</code> codec: <pre class="prettyprint"><code>>>> a="Hello\u2026" >>> a.decode('unicode-escape') u'Hello\u2026' >>> print _ Hello… </code></pre> This is because for a non-unicode string the <code>\u2026</code> is not recognised but is instead treated as a literal series of characters (to put it more clearly, <code>'Hello\\u2026'</code>). You need to decode the escapes, and the <code>unicode-escape</code> codec can do that for you. Note that you can get <code>unicode</code> to recognise it in the same way by specifying the codec argument: <pre class="prettyprint"><code>>>> unicode(a, 'unicode-escape') u'Hello\u2026' </code></pre> But the <code>a.decode()</code> way is nicer.

Python string to unicode [duplicate]

Possible Duplicate:
How do I treat an ASCII string as unicode and unescape the escaped characters in it in python?
How do convert unicode escape sequences to unicode characters in a python string

I have a string that contains unicode characters e.g. \u2026 etc. Somehow it is not received to me as unicode, but is received as a str. How do I convert it back to unicode?

>>> a="Hello\u2026" >>> b=u"Hello\u2026" >>> print a Hello\u2026 >>> print b Hello… >>> print unicode(a) Hello\u2026 >>>

So clearly unicode(a) is not the answer. Then what is?

How do you make a string containing Unicode characters in Python?

You have two options to create Unicode string in Python. Either use decode() , or create a new Unicode string with UTF-8 encoding by unicode(). The unicode() method is unicode(string[, encoding, errors]) , its arguments should be 8-bit strings.

Is UTF-8 Unicode?

UTF-8 is a Unicode character encoding method. This means that UTF-8 takes the code point for a given Unicode character and translates it into a string of binary. It also does the reverse, reading in binary digits and converting them back to characters.

How do I get Unicode in Python?

In Python, the built-in functions chr() and ord() are used to convert between Unicode code points and characters. A character can also be represented by writing a hexadecimal Unicode code point with \x , \u , or \U in a string literal.

Does Python support Unicode?

Python's string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters.

Unicode escapes only work in unicode strings, so this

 a="\u2026"

is actually a string of 6 characters: '\', 'u', '2', '0', '2', '6'.

To make unicode out of this, use decode('unicode-escape'):

a="\u2026" print repr(a) print repr(a.decode('unicode-escape'))  ## '\\u2026' ## u'\u2026'

Decode it with the unicode-escape codec:

>>> a="Hello\u2026" >>> a.decode('unicode-escape') u'Hello\u2026' >>> print _ Hello…

This is because for a non-unicode string the \u2026 is not recognised but is instead treated as a literal series of characters (to put it more clearly, 'Hello\\u2026'). You need to decode the escapes, and the unicode-escape codec can do that for you.

Note that you can get unicode to recognise it in the same way by specifying the codec argument:

>>> unicode(a, 'unicode-escape') u'Hello\u2026'

But the a.decode() way is nicer.

Python string to unicode [duplicate]

Tags:

python

string

unicode

python-2.x

python-unicode

prongs

People also ask

2 Answers

georg

Chris Morgan

Recent Activity

Donate For Us

Python string to unicode [duplicate]

Tags:

python

string

unicode

python-2.x

python-unicode

prongs

People also ask

2 Answers

georg

Chris Morgan

Related questions

Recent Activity

Donate For Us