Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse a non-ascii (unicode) number-string as integer in .NET

Tags:

.net

unicode

I have a string containing a number in a non-ascii format e.g. unicode BENGALI DIGIT ONE (U+09E7) : "১"

How do I parse this as an integer in .NET?

Note: I've tried using int.Parse() specifying a bengali culture format with "bn-BD" as the IFormatProvider. Doesn't work.

like image 821
James McCormack Avatar asked May 26 '11 15:05

James McCormack


2 Answers

You could create a new string that is the same as the old string except the native digits are replaced with Latin decimal digits. This could be done reliably by looping through the characters and checking the value of char.IsDigit(char). If this function returns true, then convert it with char.GetNumericValue(char).ToString().

Like this:

static class DigitHelper
{
    public static string ConvertNativeDigits(this string text)
    {
        if (text == null)
            return null;
        if (text.Length == 0)
            return string.Empty;
        StringBuilder sb = new StringBuilder();
        foreach (char character in text)
        {
            if (char.IsDigit(character))
                sb.Append(char.GetNumericValue(character));
            else
                sb.Append(character);
        }
        return sb.ToString();
    }
}


int value = int.Parse(bengaliNumber.ConvertNativeDigits());
like image 198
Jeffrey L Whitledge Avatar answered Sep 29 '22 13:09

Jeffrey L Whitledge


It looks like this is not possible using built in functionality:

The only Unicode digits that the .NET Framework parses as decimals are the ASCII digits 0 through 9, specified by the code values U+0030 through U+0039.

...

The attempts to parse the Unicode code values for Fullwidth digits, Arabic-Indic digits, and Bengali digits fail and throw an exception.

(emphasis mine)

Very strange as CultureInfo("bn-BD").NumberFormat.NativeDigits does contain them.

like image 20
Oded Avatar answered Sep 29 '22 14:09

Oded