Arabic and Chinese have their own glyphs for digits.
int
works correctly with all the different ways to write numbers.
I was not able to reproduce the behaviour (python 3.5.0)
>>> from unicodedata import name
>>> name('𐹤')
'RUMI DIGIT FIVE'
>>> int('𐹤')
ValueError: invalid literal for int() with base 10: '𐹤'
>>> int('五') # chinese/japanese number five
ValueError: invalid literal for int() with base 10: '五'
Am I doing something wrong? Or is the claim simply incorrect (source).
Here's a way to convert to numerical values (casting to int
does not work in all cases, unless there's a secret setting somewhere)
from unicodedata import numeric
print(numeric('五'))
result: 5.0
Someone noted (and was right) that some arabic or other chars worked fine with int
, so a routine with a fallback mechanism could be done:
from unicodedata import numeric
def to_integer(s):
try:
r = int(s)
except ValueError:
r = int(numeric(s))
return r
EDIT: as zvone noted, there are fraction characters that return floating point numbers: ex: numeric('\u00be') is 0.75
(3/4 char). So rounding to int is not always safe.
EDIT2: the numeric
function only accepts one character. So the "conversion to numeric" that could handle most cases without risks of rounding would be
from unicodedata import numeric
def to_float(s):
try:
r = float(s)
except ValueError:
r = numeric(s)
return r
print(to_float('۵۵'))
print(to_float('五'))
print(to_float('¾'))
result:
55.0
5.0
0.75
(I don't want to steal user2357112 excellent explanation, but still wanted to provide a solution that tries to cover all cases)
int
does not accept all ways to write numbers. It understands digit characters used for positional numeral systems, but neither Rumi nor Chinese numerals are positional. Neither '五五'
nor two copies of Rumi numeral 5 would represent 55, so int
doesn't accept them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With