I am parsing a webpage which has Unicode representations of fractions. I would like to be able to take those strings directly and convert them to floats. For example:
"⅕" would become 0.2
Any suggestions of how to do this in Python?
We can convert a string to float in Python using the float() function. This is a built-in function used to convert an object to a floating point number. Internally, the float() function calls specified object __float__() function.
To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2. x, you also need to prefix the string literal with 'u'.
Python's string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters.
You want to use the unicodedata module:
import unicodedata
unicodedata.numeric(u'⅕')
This will print:
0.20000000000000001
If the character does not have a numeric value, then unicodedata.numeric(unichr[, default])
will return default, or if default is not given will raise ValueError.
Those Unicode representations of floats are called Vulgar Fractions
You can covert them to floats using unicodedata.numeric(char)
However, numeric(char)
won't work on something like 3¾
. That takes a bit more effort:
from unicodedata import numeric
samples = ["3¼","19¼","3 ¼","10"]
for i in samples:
if len(i) == 1:
v = numeric(i)
elif i[-1].isdigit():
# normal number, ending in [0-9]
v = float(i)
else:
# Assume the last character is a vulgar fraction
v = float(i[:-1]) + numeric(i[-1])
print(i, v)
Output:
3¼ 3.25
19¼ 19.25
3 ¼ 3.25
10 10.0
You might also be interested isolating these vulgar fractions from broader user input using regular expressions. You can do so using ranges of their unicode character codes:
/[\u2150-\u215E\u00BC-\u00BE]/g
Sample: https://regexr.com/3p8nd
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With