Just what the title says.
$ ./configure --help | grep -i ucs
--enable-unicode[=ucs[24]]
Searching the official documentation, I found this:
sys.maxunicode: An integer giving the largest supported code point for a Unicode character. The value of this depends on the configuration option that specifies whether Unicode characters are stored as UCS-2 or UCS-4.
What is not clear here is - which value(s) correspond to UCS-2 and UCS-4.
The code is expected to work on Python 2.6+.
When built with --enable-unicode=ucs4:
>>> import sys
>>> print sys.maxunicode
1114111
When built with --enable-unicode=ucs2:
>>> import sys
>>> print sys.maxunicode
65535
It's 0xFFFF (or 65535) for UCS-2, and 0x10FFFF (or 1114111) for UCS-4:
Py_UNICODE
PyUnicode_GetMax(void)
{
#ifdef Py_UNICODE_WIDE
return 0x10FFFF;
#else
/* This is actually an illegal character, so it should
not be passed to unichr. */
return 0xFFFF;
#endif
}
The maximum character in UCS-4 mode is defined by the maxmimum value representable in UTF-16.
I had this same issue once. I documented it for myself on my wiki at
http://arcoleo.org/dsawiki/Wiki.jsp?page=Python%20UTF%20-%20UCS2%20or%20UCS4
I wrote -
import sys
sys.maxunicode > 65536 and 'UCS4' or 'UCS2'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With