I often use Char.IsDigit
to check if a char
is a digit which is especially handy in LINQ queries to pre-check int.Parse
as here: "123".All(Char.IsDigit)
.
But there are chars which are digits but which can't be parsed to int
like ۵
.
// true
bool isDigit = Char.IsDigit('۵');
var cultures = CultureInfo.GetCultures(CultureTypes.SpecificCultures);
int num;
// false
bool isIntForAnyCulture = cultures
.Any(c => int.TryParse('۵'.ToString(), NumberStyles.Any, c, out num));
Why is that? Is my int.Parse
-precheck via Char.IsDigit
thus incorrect?
There are 310 chars which are digits:
List<char> digitList = Enumerable.Range(0, UInt16.MaxValue)
.Select(i => Convert.ToChar(i))
.Where(c => Char.IsDigit(c))
.ToList();
Here's the implementation of Char.IsDigit
in .NET 4 (ILSpy):
public static bool IsDigit(char c)
{
if (char.IsLatin1(c))
{
return c >= '0' && c <= '9';
}
return CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.DecimalDigitNumber;
}
So why are there chars that belong to the DecimalDigitNumber
-category("Decimal digit character, that is, a character in the range 0 through 9...") which can't be parsed to an int
in any culture?
IsDigit(Char) Method. This method is used to check whether the specified Unicode character matches decimal digit or not. If it matches then it returns True otherwise return False.
IsDigit() method in C# indicates whether the specified Unicode character is categorized as a decimal digit.
As far as this program is concerned, a string of digits is a string of characters and is treated like any string of characters. The digits are NOT automatically converted into a numeric type.
It's because it is checking for all digits in the Unicode "Number, Decimal Digit" category, as listed here:
http://www.fileformat.info/info/unicode/category/Nd/list.htm
It doesn't mean that it is a valid numeric character in the current locale. In fact using int.Parse()
, you can ONLY parse the normal English digits, regardless of the locale setting.
For example, this doesn't work:
int test = int.Parse("٣", CultureInfo.GetCultureInfo("ar"));
Even though ٣
is a valid Arabic digit character, and "ar" is the Arabic locale identifier.
The Microsoft article "How to: Parse Unicode Digits" states that:
The only Unicode digits that the .NET Framework parses as decimals are the ASCII digits 0 through 9, specified by the code values U+0030 through U+0039. The .NET Framework parses all other Unicode digits as characters.
However, note that you can use char.GetNumericValue()
to convert a unicode numeric character to its numeric equivalent as a double.
The reason the return value is a double and not an int is because of things like this:
Console.WriteLine(char.GetNumericValue('¼')); // Prints 0.25
You could use something like this to convert all numeric characters in a string into their ASCII equivalent:
public string ConvertNumericChars(string input)
{
StringBuilder output = new StringBuilder();
foreach (char ch in input)
{
if (char.IsDigit(ch))
{
double value = char.GetNumericValue(ch);
if ((value >= 0) && (value <= 9) && (value == (int)value))
{
output.Append((char)('0'+(int)value));
continue;
}
}
output.Append(ch);
}
return output.ToString();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With