Take a look at the following C# code:
byte[] StringToBytesToBeHashed(string to_be_hashed) {
byte[] to_be_hashed_byte_array = new byte[to_be_hashed.Length];
int i = 0;
foreach (char cur_char in to_be_hashed)
{
to_be_hashed_byte_array[i++] = (byte)cur_char;
}
return to_be_hashed_byte_array;
}
(function above was extracted from these lines of code from the WMSAuth github repo)
My question is: What the casting from byte to char does in terms of Encoding?
I guess it really does nothing in terms of Encoding, but does that mean that the Encoding.Default is the one which is used and so the byte to return will depend on how the framework will encode the underlying string in the specific Operative System?
And besides, is the char actually bigger than a byte (I'm guessing 2 bytes) and will actually omit the first byte?
I was thinking in replacing all this by:
Encoding.UTF8.GetBytes(stringToBeHashed)
What do you think?
Syntax: byte by = (byte) ch; Here, ch is the char variable to be converted into Byte. It tells the compiler to convert the char into its byte equivalent value.
To get the right point use char c = (char) (b & 0xFF) which first converts the byte value of b to the positive integer 200 by using a mask, zeroing the top 24 bits after conversion: 0xFFFFFFC8 becomes 0x000000C8 or the positive number 200 in decimals.
Every character type in Java occupies 2 bytes in size. For converting a String to its byte array equivalent we convert every character of the String to its 2 byte representation.
The .NET Framework uses Unicode to represent all its characters and strings. The integer value of a char (which you may obtain by casting to int
) is equivalent to its UTF-16 code unit. For characters in the Basic Multilingual Plane (which constitute the majority of characters you'll ever encounter), this value is the Unicode code point.
The .NET Framework uses the
Char
structure to represent a Unicode character. The Unicode Standard identifies each Unicode character with a unique 21-bit scalar number called a code point, and defines the UTF-16 encoding form that specifies how a code point is encoded into a sequence of one or more 16-bit values. Each 16-bit value ranges from hexadecimal0x0000
through0xFFFF
and is stored in aChar
structure. The value of aChar
object is its 16-bit numeric (ordinal) value. — Char Structure
Casting a char
to byte
will result in data loss for any character whose value is larger than 255. Try running the following simple example to understand why:
char c1 = 'D'; // code point 68
byte b1 = (byte)c1; // b1 is 68
char c2 = 'ń'; // code point 324
byte b2 = (byte)c2; // b2 is 68 too!
// 324 % 256 == 68
Yes, you should definitely use Encoding.UTF8.GetBytes
instead.
Casting between byte
and char
is like using the ISO-8859-1 encoding (= the first 256 characters of Unicode), except it silently loses information when encoding characters beyond U+00FF.
And besides, is the char actually bigger than a byte (I'm guessing 2 bytes) and will actually omit the first byte?
Yes. A C# char
= UTF-16 code unit = 2 bytes.
char
represents a 16-bit UTF-16 code point. Casting a char
to a byte
results in the lower byte of the character, but both Douglas and dan04 are wrong in that it will always quietly discard the higher byte. If the higher byte is not zero the result depends on whether the compiler option Check for arithmetic overflow/underflow is set:
using System;
namespace CharTest
{
class Program
{
public static void Main(string[] args)
{ ByteToCharTest( 's' );
ByteToCharTest( 'ы' );
Console.ReadLine();
}
static void ByteToCharTest( char c )
{ const string MsgTemplate =
"Casting to byte character # {0}: {1}";
string msgRes;
byte b;
msgRes = "Success";
try
{ b = ( byte )c; }
catch( Exception e )
{ msgRes = e.Message; }
Console.WriteLine(
String.Format( MsgTemplate, (Int16)c, msgRes ) );
}
}
}
Output with overflow checking:
Casting to byte character # 115: Success
Casting to byte character # 1099: Arithmetic operation resulted in an overflow.
Output without overflow checking:
Casting to byte character # 115: Success
Casting to byte character # 1099: Success
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With