Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python unicode string literals :: what's the difference between '\u0391' and u'\u0391'

I am using Python 2.7.3. Can anybody explain the difference between the literals:

'\u0391'

and:

u'\u0391'

and the different way they are echoed in the REPL below (especially the extra slash added to a1):

>>> a1='\u0391'
>>> a1
'\\u0391'
>>> type(a1)
<type 'str'>
>>> 
>>> a2=u'\u0391'
>>> a2
u'\u0391'
>>> type(a2)
<type 'unicode'>
>>> 
like image 642
Marcus Junius Brutus Avatar asked Jan 28 '13 09:01

Marcus Junius Brutus


2 Answers

You can only use unicode escapes (\uabcd) in a unicode string literal. They have no meaning in a byte string. A Python 2 Unicode literal (u'some text') is a different type of Python object from a python byte string ('some text').

It's like using \t versus \T; the former has meaning in python literals (it's interpreted as a tab character), the latter just means a backslash and a capital T (two characters).

To help understand the difference between Unicode and byte strings, please do read the Python Unicode HOWTO; I can also recommend the Joel Spolsky on Unicode article.

Note: in Python 3, the same differences apply, but 'some text' is a Unicode string literal, and b'some text' is the bytestring syntax.

like image 184
Martijn Pieters Avatar answered Nov 11 '22 03:11

Martijn Pieters


As opposed to C, in Python a string can be enclosed in simple quotes (') as well as double quotes (") -- leaving aside the triple-double quotes """.

Thus, '\u0391' is only a string containing the letters \, u, 0, 3, 9 and 1. When pretty printing this string, the \ is escaped via another \.

On the contrary, having a u in front makes the string to be considered Unicode and all escapes are evaluated. Thus, u'\u0391' is interpreted as "the Unicode string containing codepoint 0391" which is different from the above.

like image 43
Mihai Maruseac Avatar answered Nov 11 '22 04:11

Mihai Maruseac