My code:
string input1;
input1 = Console.ReadLine();
Console.WriteLine("byte output");
byte[] bInput1 = Encoding.Unicode.GetBytes(input1);
for (int x = 0; x < bInput1.Length; x++)
Console.WriteLine("{0} = {1}", x, bInput1[x]);
outputs:
104 0 101 0 108 0 108 0 111 0
for the input "hello"
Is there a reference to the character map where I can make sense of this?
You should read "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" at http://www.joelonsoftware.com/articles/Unicode.html
You can find a list of all Unicode characters at http://www.unicode.org but don't expect to be able to read the files there without learning a lot about text encoding issues.
At http://www.unicode.org/charts/ you can find all the Unicode code charts. http://www.unicode.org/charts/PDF/U0000.pdf shows that the code point for 'h' is U+0068. (Another great tool for viewing this data is BabelMap.)
The exact details of UTF-16 encoding can be found at http://unicode.org/faq/utf_bom.html#6 and http://www.ietf.org/rfc/rfc2781.txt. In short, U+0068 is encoded (in UTF-16LE) as 0x68 0x00. In decimal, this is the first two bytes you see: 104 0.
The other characters are encoded similarly.
Finally, a great reference (when trying to understand the various Unicode specifications), apart from the Unicode Standard itself, is the Unicode Glossary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With