http://msdn.microsoft.com/en-us/library/1x308yk8.aspx
This allows me to do this:
var str = "string ";
Char.IsWhiteSpace(str, 6);
Rather than:
Char.IsWhiteSpace(str[6]);
Seems unusual, so I looked at the reflection:
[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
public static bool IsWhiteSpace(char c)
{
if (char.IsLatin1(c))
{
return char.IsWhiteSpaceLatin1(c);
}
return CharUnicodeInfo.IsWhiteSpace(c);
}
[SecuritySafeCritical]
public static bool IsWhiteSpace(string s, int index)
{
if (s == null)
{
throw new ArgumentNullException("s");
}
if (index >= s.Length)
{
throw new ArgumentOutOfRangeException("index");
}
if (char.IsLatin1(s[index]))
{
return char.IsWhiteSpaceLatin1(s[index]);
}
return CharUnicodeInfo.IsWhiteSpace(s, index);
}
Three things struck me:
ArgumentOutOfRangeException
, while index below 0 would give string's standard IndexOutOfRangeException
SecuritySafeCriticalAttribute
which I've read the general blerb about, but still unclear what it is doing here and if it is linked to the upper bound check.TargetedPatchingOptOutAttribute
is not present on other Is...(char)
methods. Example IsLetter
, IsNumber
etc.Because not every character fits in a C#
char. For instance, "𠀀"
takes 2 C# chars
, and you couldn't get any information about that character with just a char
overload. With String
and an index, the methods can see if the character at index i
is a High Surrogate char
, and then read the Low Surrogate char
at next index, add them up according to the algorithm, and retrieve info about the code point U+20000
.
This is how UTF-16 can encode 1 million different code points, it's a variable-width encoding. It takes 2-4 bytes to encode a character, or 1-2 C# chars.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With