Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Character encoding

Tags:

c#

encoding

For this piece of code:

String content = String.Empty;
ListenerStateObject state = (ListenerStateObject)ar.AsyncState;
Socket handler = state.workSocket;

int bytesRead = handler.EndReceive(ar);

if (bytesRead > 0)
{
   state.sb.Append(Encoding.UTF8.GetString(state.buffer, 0, bytesRead));

   content = state.sb.ToString();
   ...

I'm geting 'Ol?' instead of 'Olá'

What's wrong with it?

like image 644
RedEagle Avatar asked May 17 '11 11:05

RedEagle


People also ask

What is meant by character encoding?

Character encoding tells computers how to interpret digital data into letters, numbers and symbols. This is done by assigning a specific numeric value to a letter, number or symbol. These letters, numbers, and symbols are classified as “characters”.

What are the 3 types of character encoding?

There are three different Unicode character encodings: UTF-8, UTF-16 and UTF-32.

What does character encoding mean in HTML?

Character encoding is a method of converting bytes into characters. To validate or display an HTML document properly, a program must choose a proper character encoding.

What is the most common character encoding?

UTF-8 is the most commonly used encoding scheme used on today's computer systems and computer networks.


2 Answers

Most likely it's the wrong encoding.

But if you use this code to receive blocks of bytes (split by a protocol) you will have a serious flaw: there is no guarantee that the block were independently encoded.

Simple case: the boundary of 2 blocks cuts through a multi-byte encoded char.

Best solution: Attach a TextReader to your Stream.

like image 91
Henk Holterman Avatar answered Oct 21 '22 10:10

Henk Holterman


Are you sure that the stream is actually utf-8 encoded? Try inspecting the raw bytes in the buffer before encoding (there should be 4) and see what the actual byte values are.

like image 36
JacquesB Avatar answered Oct 21 '22 11:10

JacquesB