Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do Mac OS X python versus CentOS Linux python have different interpretations of \U escapes in strings?

Two python interpreter sessions. The first is from python on CentOS. The second is from the built-in python on Mac OS X 10.7. Why does the second session create strings of length two from the \U escape sequence, and subsequently error out?

$ python
Python 2.6.6 (r266:84292, Dec  7 2011, 20:48:22) 
[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> u'\U00000020'
u' '
>>> u'\U00000065'
u'e'
>>> u'\U0000FFFF'
u'\uffff'
>>> u'\U00010000'
u'\U00010000'
>>> len(u'\U00010000')
1
>>> ord(u'\U00010000')
65536

$ python
Python 2.6.7 (r267:88850, Jul 31 2011, 19:30:54) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
>>> u'\U00000020'
u' '
>>> u'\U00000065'
u'e'
>>> u'\U0000FFFF'
u'\uffff'
>>> u'\U00010000'
u'\U00010000'
>>> len(u'\U00010000')
2
>>> ord(u'\U00010000')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ord() expected a character, but string of length 2 found
like image 647
audiodude Avatar asked Oct 08 '22 15:10

audiodude


1 Answers

I'm not at all sure about this, but it may be that your Mac OS X system uses a "narrow build" of python that represents unicode with only 16 bits for internal encoding of unicode, and represents the unicode code points above 2**16 as a character pair (which would explain len(u'\U00010000') == 2.

Try unichr(0x10000) on OS X and see if you get an error referring to narrow builds. See also What encoding do normal python strings use?, in particular IVH's answer.

It's possible to recompile python to use a wide build even if the default python on your system uses a narrow build.

like image 112
Justin Blank Avatar answered Oct 12 '22 10:10

Justin Blank