Can someone explain exactly what is going on with this code:
var letter= 'J';
char c = (char)(0x000000ff & (uint)letter);
I understand it is getting the unicode representation of the character, however I don't fully understand the role of:
(0x000000ff & (uint)letter
What is the purpose of 0x000000ff and the casting of the letter to (uint) and is there a short hand way of achieving the same result?
Thanks
OK, looks like most people think this is a bad example, I didn't want to include the whole class but I suppose I might as well so you can see the context. From Reference Source's WebHeaderCollection:
private static string CheckBadChars(string name, bool isHeaderValue)
{
if (name == null || name.Length == 0)
{
// emtpy name is invlaid
if (!isHeaderValue)
{
throw name == null ?
new ArgumentNullException("name") :
new ArgumentException(SR.GetString(SR.WebHeaderEmptyStringCall, "name"), "name");
}
// empty value is OK
return string.Empty;
}
if (isHeaderValue)
{
// VALUE check
// Trim spaces from both ends
name = name.Trim(HttpTrimCharacters);
// First, check for correctly formed multi-line value
// Second, check for absenece of CTL characters
int crlf = 0;
for (int i = 0; i < name.Length; ++i)
{
char c = (char)(0x000000ff & (uint)name[i]);
switch (crlf)
{
case 0:
if (c == '\r')
{
crlf = 1;
}
else if (c == '\n')
{
// Technically this is bad HTTP. But it would be a breaking change to throw here.
// Is there an exploit?
crlf = 2;
}
else if (c == 127 || (c < ' ' && c != '\t'))
{
throw new ArgumentException(SR.GetString(SR.WebHeaderInvalidControlChars), "value");
}
break;
case 1:
if (c == '\n')
{
crlf = 2;
break;
}
throw new ArgumentException(SR.GetString(SR.WebHeaderInvalidCRLFChars), "value");
case 2:
if (c == ' ' || c == '\t')
{
crlf = 0;
break;
}
throw new ArgumentException(SR.GetString(SR.WebHeaderInvalidCRLFChars), "value");
}
}
if (crlf != 0)
{
throw new ArgumentException(SR.GetString(SR.WebHeaderInvalidCRLFChars), "value");
}
}
else
{
// NAME check
// First, check for absence of separators and spaces
if (name.IndexOfAny(InvalidParamChars) != -1)
{
throw new ArgumentException(SR.GetString(SR.WebHeaderInvalidHeaderChars), "name");
}
// Second, check for non CTL ASCII-7 characters (32-126)
if (ContainsNonAsciiChars(name))
{
throw new ArgumentException(SR.GetString(SR.WebHeaderInvalidNonAsciiChars), "name");
}
}
return name;
}
The bit of interest is:
char c = (char)(0x000000ff & (uint)name[i]);
You're parsing HTTP headers, right? That means you shouldn't be using (any) unicode encoding.
HTTP headers must be 7-bit ASCII (unlike the request data)1. That means that you should be using the ASCII encoding instead of the default. So while you are parsing the request bytes, you have to use Encoding.ASCII.GetString instead of Encoding.Default.GetString. Hopefully, you're not using StreamReader - that would be a bad idea for quite a few reasons, including the (likely) encoding mismatch between the headers and the content of the request.
EDIT:
As for the use in Microsoft source-code - yeah, it happens. Don't try to copy those kinds of things over - it is a hack. Remember, you don't have the test suites and quality assurance Microsoft engineers have, so even if it does in fact work, you're better off not copying such hacks.
I assume that it's handled this way because of the use of string for something that in principle should be either "ASCII string" or just byte[] - since .NET only supports unicode strings, this was seen as the lesser evil (indeed, that's why the code explicitly checks that the string doesn't contain any unicode characters - it's well aware that the headers must be ASCII - it will fail explicitly if the string has any non-ASCII characters. It's just the usual tradeoff when writing high-performance frameworks for other people to build on.
Footnotes:
What is the purpose of 0x000000ff and the casting of the letter to (uint)
to get character with code from [0..255] range: char takes 2 bytes in memory
e.g.:
var letter= (char)4200; // ၩ
char c = (char)(0x000000ff & (uint)letter); // h
// or
// char c = (char)(0x00ff & (ushort)letter);
// ushort (2-byte unsigned integer) is enough: uint is 4-byte unsigned integer
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With