Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert unicode characters to floats in Python?

I am parsing a webpage which has Unicode representations of fractions. I would like to be able to take those strings directly and convert them to floats. For example:

"⅕" would become 0.2

Any suggestions of how to do this in Python?

like image 618
Paul Avatar asked Aug 12 '09 01:08

Paul


People also ask

How do you convert text to float in Python?

We can convert a string to float in Python using the float() function. This is a built-in function used to convert an object to a floating point number. Internally, the float() function calls specified object __float__() function.

How do you use Unicode characters in Python?

To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2. x, you also need to prefix the string literal with 'u'.

Does Python accept Unicode?

Python's string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters.


2 Answers

You want to use the unicodedata module:

import unicodedata
unicodedata.numeric(u'⅕')

This will print:

0.20000000000000001

If the character does not have a numeric value, then unicodedata.numeric(unichr[, default]) will return default, or if default is not given will raise ValueError.

like image 99
Karl Voigtland Avatar answered Sep 19 '22 21:09

Karl Voigtland


Those Unicode representations of floats are called Vulgar Fractions

You can covert them to floats using unicodedata.numeric(char)

However, numeric(char) won't work on something like . That takes a bit more effort:

from unicodedata import numeric

samples = ["3¼","19¼","3 ¼","10"]

for i in samples:
    if len(i) == 1:
        v = numeric(i)
    elif i[-1].isdigit():
        # normal number, ending in [0-9]
        v = float(i)
    else:
        # Assume the last character is a vulgar fraction
        v = float(i[:-1]) + numeric(i[-1])
    print(i, v)

Output:

3¼ 3.25
19¼ 19.25
3 ¼ 3.25
10 10.0

You might also be interested isolating these vulgar fractions from broader user input using regular expressions. You can do so using ranges of their unicode character codes:

/[\u2150-\u215E\u00BC-\u00BE]/g

Sample: https://regexr.com/3p8nd

like image 32
Jason Lewallen Avatar answered Sep 20 '22 21:09

Jason Lewallen