Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ValueError: unichr() arg not in range(0x10000) (narrow Python build)

Tags:

python

html

I am trying to convert the html entity to unichar, the html entity is 󮠖 when i try to do the following:

unichr(int(976918))

I got error that:

ValueError: unichr() arg not in range(0x10000) (narrow Python build)

seems like it is out of range conversion for unichar.

like image 691
Aamir Rind Avatar asked Aug 18 '11 10:08

Aamir Rind


3 Answers

You can decode a string that has a Unicode escape (\U followed by 8 hex digits, zero-padded) using the "unicode-escape" encoding:

>>> s = "\\U%08x" % 976918
>>> s
'\\U000ee816'

>>> c = s.decode('unicode-escape')
>>> c
u'\U000ee816'

On a narrow build it's stored as a UTF-16 surrogate pair:

>>> list(c)
[u'\udb7a', u'\udc16']

This surrogate pair is processed correctly as a code unit during encoding:

>>> c.encode('utf-8')
'\xf3\xae\xa0\x96'

>>> '\xf3\xae\xa0\x96'.decode('utf-8')
u'\U000ee816'
like image 109
Eryk Sun Avatar answered Oct 04 '22 20:10

Eryk Sun


Here's an alternate workaround that I developed with the struct module.

def unichar(i):
    try:
        return unichr(i)
    except ValueError:
        return struct.pack('i', i).decode('utf-32')

>>> unichar(int('976918'))
u'\U000ee816'
like image 12
Mark Ransom Avatar answered Oct 04 '22 21:10

Mark Ransom


In order for this to work, you either need to build Python yourself, specifying

./configure --enable-unicode=ucs4

before compiling, or else you need to move to Python 3.

Even if you do this, there are apparently problems on Windows, which will be fixed in the next version of Python (3.3).

like image 6
agf Avatar answered Oct 04 '22 20:10

agf