Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does every Char static "Is..." have a string overload, e.g. IsWhiteSpace(string, Int32)?

Tags:

c#

http://msdn.microsoft.com/en-us/library/1x308yk8.aspx

This allows me to do this:

var str = "string ";
Char.IsWhiteSpace(str, 6);

Rather than:

Char.IsWhiteSpace(str[6]);

Seems unusual, so I looked at the reflection:

[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
public static bool IsWhiteSpace(char c)
{
    if (char.IsLatin1(c))
    {
        return char.IsWhiteSpaceLatin1(c);
    }
    return CharUnicodeInfo.IsWhiteSpace(c);
}

[SecuritySafeCritical]
public static bool IsWhiteSpace(string s, int index)
{
    if (s == null)
    {
        throw new ArgumentNullException("s");
    }
    if (index >= s.Length)
    {
        throw new ArgumentOutOfRangeException("index");
    }
    if (char.IsLatin1(s[index]))
    {
        return char.IsWhiteSpaceLatin1(s[index]);
    }
    return CharUnicodeInfo.IsWhiteSpace(s, index);
}

Three things struck me:

  1. Why does it bother to do the limit check only on the upper bound? Throwing an ArgumentOutOfRangeException, while index below 0 would give string's standard IndexOutOfRangeException
  2. The precense of SecuritySafeCriticalAttribute which I've read the general blerb about, but still unclear what it is doing here and if it is linked to the upper bound check.
  3. TargetedPatchingOptOutAttribute is not present on other Is...(char) methods. Example IsLetter, IsNumber etc.
like image 437
weston Avatar asked Dec 19 '12 15:12

weston


1 Answers

Because not every character fits in a C# char. For instance, "𠀀" takes 2 C# chars, and you couldn't get any information about that character with just a char overload. With String and an index, the methods can see if the character at index i is a High Surrogate char, and then read the Low Surrogate char at next index, add them up according to the algorithm, and retrieve info about the code point U+20000.

This is how UTF-16 can encode 1 million different code points, it's a variable-width encoding. It takes 2-4 bytes to encode a character, or 1-2 C# chars.

like image 172
Esailija Avatar answered Oct 30 '22 02:10

Esailija