Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Char.IsDigit returns true for chars which can't be parsed to int?

I often use Char.IsDigit to check if a char is a digit which is especially handy in LINQ queries to pre-check int.Parse as here: "123".All(Char.IsDigit).

But there are chars which are digits but which can't be parsed to int like ۵.

// true
bool isDigit = Char.IsDigit('۵'); 

var cultures = CultureInfo.GetCultures(CultureTypes.SpecificCultures);
int num;
// false
bool isIntForAnyCulture = cultures
    .Any(c => int.TryParse('۵'.ToString(), NumberStyles.Any, c, out num)); 

Why is that? Is my int.Parse-precheck via Char.IsDigit thus incorrect?

There are 310 chars which are digits:

List<char> digitList = Enumerable.Range(0, UInt16.MaxValue)
   .Select(i => Convert.ToChar(i))
   .Where(c => Char.IsDigit(c))
   .ToList(); 

Here's the implementation of Char.IsDigit in .NET 4 (ILSpy):

public static bool IsDigit(char c)
{
    if (char.IsLatin1(c))
    {
        return c >= '0' && c <= '9';
    }
    return CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.DecimalDigitNumber;
}

So why are there chars that belong to the DecimalDigitNumber-category("Decimal digit character, that is, a character in the range 0 through 9...") which can't be parsed to an int in any culture?

like image 788
Tim Schmelter Avatar asked Feb 27 '14 08:02

Tim Schmelter


People also ask

Does IsDigit work on char?

IsDigit(Char) Method. This method is used to check whether the specified Unicode character matches decimal digit or not. If it matches then it returns True otherwise return False.

What is IsDigit in C#?

IsDigit() method in C# indicates whether the specified Unicode character is categorized as a decimal digit.

Is digit a character?

As far as this program is concerned, a string of digits is a string of characters and is treated like any string of characters. The digits are NOT automatically converted into a numeric type.


Video Answer


1 Answers

It's because it is checking for all digits in the Unicode "Number, Decimal Digit" category, as listed here:

http://www.fileformat.info/info/unicode/category/Nd/list.htm

It doesn't mean that it is a valid numeric character in the current locale. In fact using int.Parse(), you can ONLY parse the normal English digits, regardless of the locale setting.

For example, this doesn't work:

int test = int.Parse("٣", CultureInfo.GetCultureInfo("ar"));

Even though ٣ is a valid Arabic digit character, and "ar" is the Arabic locale identifier.

The Microsoft article "How to: Parse Unicode Digits" states that:

The only Unicode digits that the .NET Framework parses as decimals are the ASCII digits 0 through 9, specified by the code values U+0030 through U+0039. The .NET Framework parses all other Unicode digits as characters.

However, note that you can use char.GetNumericValue() to convert a unicode numeric character to its numeric equivalent as a double.

The reason the return value is a double and not an int is because of things like this:

Console.WriteLine(char.GetNumericValue('¼')); // Prints 0.25

You could use something like this to convert all numeric characters in a string into their ASCII equivalent:

public string ConvertNumericChars(string input)
{
    StringBuilder output = new StringBuilder();

    foreach (char ch in input)
    {
        if (char.IsDigit(ch))
        {
            double value = char.GetNumericValue(ch);

            if ((value >= 0) && (value <= 9) && (value == (int)value))
            {
                output.Append((char)('0'+(int)value));
                continue;
            }
        }

        output.Append(ch);
    }

    return output.ToString();
}
like image 173
Matthew Watson Avatar answered Nov 10 '22 00:11

Matthew Watson