Parsing unicode 'Vulgar Fractions' into double in Java

Question

I am scraping some data of a web site and parts of it include fractions in unicode e.g. 6' 5¼". I have successfully used the regex (\d)' (\d{1,2}([\xbc-\xbe])?)\" to extract each part of the String.

This gives me two strings, one is "6" and the other is "5¼".

The troublesome part is the bit that contains the unicode vulgar fractions. Obviously it does not parse correctly using Double.parseDouble.

I have looked everywhere for examples for Java but have been unable to find any, how would I go about getting ¼ out as 0.25?

If it makes it easier I can split the regex up again so it returns the fraction part seperately so I get three strings out instead of two.

bobince · Accepted Answer

There is a way to do it without having your own table of vulgar fractions to values, you can use the one built into the Unicode data.

If you convert to Unicode Normalization Form KD, it'll decompose fractions into a fraction-slash (U+2044) surrounded by plain numbers. So you could do something like:

String[] fraction = Normalizer.normalize("¼", Normalizer.Form.NFKD).split("\u2044");
if (fraction.length == 2) {
    double value = (double) Integer.parseInt(fraction[0]) / Integer.parseInt(fraction[1]);
}

There are more fraction characters than the ones in the range U+00BC–U+00BE, for example ⅛, so if you want to avoid hard-coding that range I'd suggest changing the regexp for it to something like [^\d]?.

Parsing unicode 'Vulgar Fractions' into double in Java

Tags:

java

regex

double

unicode

Matt Williams

1 Answers

bobince

Recent Activity

Donate For Us

Parsing unicode 'Vulgar Fractions' into double in Java

Tags:

java

regex

double

unicode

Matt Williams

1 Answers

bobince

Related questions

Recent Activity

Donate For Us