Python-3 and \x Vs \u Vs \U in string encoding and why

Question

Why do we have different byte oriented string representations in Python 3? Won't it be enough to have single representation instead of multiple?

For ASCII range number printing a string shows a sequence starting with \x:

 In [56]: chr(128)
 Out[56]: '\x80'

In a different range of numbers it Python uses a sequence starting with \u

In [57]: chr(57344)
Out[57]: '\ue000'

But numbers in the highest range, i.e the maximum Unicode number as of now, it uses a leading \U:

In [58]: chr(1114111)
Out[58]: '\U0010ffff'

Martijn Pieters · Accepted Answer

Python gives you a representation of the string, and for non-printable characters will use the shortest available escape sequence.

\x80 is the same character as \u0080 or \U00000080, but \x80 is just shorter. For chr(57344) the shortest notation is \ue000, you can't express the same character with \xhh, that notation only can be used for characters up to \0xFF.

For some characters there are even single-letter escapes, like for a newline, or for a tab.

Python has multiple notation options for historical and practical reasons. In a byte string you can only create bytes in the range 0 - 255, so there \xhh is helpful and more concise than having to use \U000hhhhh everywhere when you can't even use the full range available to that notation, and \xhh and and related codes are familiar to programmers from other languages.

Python-3 and \x Vs \u Vs \U in string encoding and why

Tags:

python

python-3.x

unicode

python-unicode

unicode-string

MaNKuR

1 Answers

Martijn Pieters

Recent Activity

Donate For Us

Python-3 and \x Vs \u Vs \U in string encoding and why

Tags:

python

python-3.x

unicode

python-unicode

unicode-string

MaNKuR

1 Answers

Martijn Pieters

Related questions

Recent Activity

Donate For Us