For this piece of code:
String content = String.Empty;
ListenerStateObject state = (ListenerStateObject)ar.AsyncState;
Socket handler = state.workSocket;
int bytesRead = handler.EndReceive(ar);
if (bytesRead > 0)
{
state.sb.Append(Encoding.UTF8.GetString(state.buffer, 0, bytesRead));
content = state.sb.ToString();
...
I'm geting 'Ol?' instead of 'Olá'
What's wrong with it?
Character encoding tells computers how to interpret digital data into letters, numbers and symbols. This is done by assigning a specific numeric value to a letter, number or symbol. These letters, numbers, and symbols are classified as “characters”.
There are three different Unicode character encodings: UTF-8, UTF-16 and UTF-32.
Character encoding is a method of converting bytes into characters. To validate or display an HTML document properly, a program must choose a proper character encoding.
UTF-8 is the most commonly used encoding scheme used on today's computer systems and computer networks.
Most likely it's the wrong encoding.
But if you use this code to receive blocks of bytes (split by a protocol) you will have a serious flaw: there is no guarantee that the block were independently encoded.
Simple case: the boundary of 2 blocks cuts through a multi-byte encoded char.
Best solution: Attach a TextReader to your Stream.
Are you sure that the stream is actually utf-8 encoded? Try inspecting the raw bytes in the buffer before encoding (there should be 4) and see what the actual byte values are.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With