How to convert unicode numbers to ints?

Question

Arabic and Chinese have their own glyphs for digits. int works correctly with all the different ways to write numbers.

I was not able to reproduce the behaviour (python 3.5.0)

>>> from unicodedata import name
>>> name('𐹤')
'RUMI DIGIT FIVE'
>>> int('𐹤')
ValueError: invalid literal for int() with base 10: '𐹤'
>>> int('五')  # chinese/japanese number five
ValueError: invalid literal for int() with base 10: '五'

Am I doing something wrong? Or is the claim simply incorrect (source).

Jean-François Fabre · Accepted Answer

Here's a way to convert to numerical values (casting to int does not work in all cases, unless there's a secret setting somewhere)

from unicodedata import numeric
print(numeric('五'))

result: 5.0

Someone noted (and was right) that some arabic or other chars worked fine with int, so a routine with a fallback mechanism could be done:

from unicodedata import numeric

def to_integer(s):
    try:
        r = int(s)
    except ValueError:
        r = int(numeric(s))
    return r

EDIT: as zvone noted, there are fraction characters that return floating point numbers: ex: numeric('\u00be') is 0.75 (3/4 char). So rounding to int is not always safe.

EDIT2: the numeric function only accepts one character. So the "conversion to numeric" that could handle most cases without risks of rounding would be

from unicodedata import numeric

def to_float(s):
    try:
        r = float(s)
    except ValueError:
        r = numeric(s)
    return r

print(to_float('۵۵'))
print(to_float('五'))
print(to_float('¾'))

result:

55.0
5.0
0.75

(I don't want to steal user2357112 excellent explanation, but still wanted to provide a solution that tries to cover all cases)

user2357112 supports Monica · Answer

int does not accept all ways to write numbers. It understands digit characters used for positional numeral systems, but neither Rumi nor Chinese numerals are positional. Neither '五五' nor two copies of Rumi numeral 5 would represent 55, so int doesn't accept them.

How to convert unicode numbers to ints?

Tags:

python

python-3.x

unicode

wim

2 Answers

Jean-François Fabre

user2357112 supports Monica

Recent Activity

Donate For Us

How to convert unicode numbers to ints?

Tags:

python

python-3.x

unicode

wim

2 Answers

Jean-François Fabre

user2357112 supports Monica

Related questions

Recent Activity

Donate For Us