Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Encoding.Default work in .NET?

I'm reading a file using:

var source = File.ReadAllText(path);

and the character © wasn't being loaded correctly.

Then, I changed it to:

var source = File.ReadAllText(path, Encoding.UTF8);

and nothing.

I decided to try using

var source = File.ReadAllText(path, Encoding.Default);

and it worked perfectly. Then I debugged it and tried to find which Encoding did the trick, and I found that it was UTF-7.

What I want to know is:

Is it recommended to use Encoding.Default, and can it guarantee all the characters of the file will be read without problems?

like image 715
Oscar Mederos Avatar asked May 15 '11 04:05

Oscar Mederos


People also ask

What is the default encoding of strings in net?

All string functions in Windows use UTF-16 and have for years.

Why does .NET use UTF-16?

NET uses UTF-16 to encode the text in a string . A char instance represents a 16-bit code unit. A single 16-bit code unit can represent any code point in the 16-bit range of the Basic Multilingual Plane.

What is the default character encoding?

encoding attribute, Java uses “UTF-8” character encoding by default. Character encoding basically interprets a sequence of bytes into a string of specific characters. The same combination of bytes can denote different characters in different character encoding.

Is UTF-8 the default encoding?

Show activity on this post. The way I read the spec, UTF-8 is not the default encoding in an XML declaration. It is only the default encoding "for an entity which begins with neither a Byte Order Mark nor an encoding declaration".


1 Answers

Encoding.Default will only guarantee that all UTF-7 character sets will be read correctly (google for the whole set). On the other hand, if you try to read a file not encoded with UTF-8 in the UTF-8 mode, you'll get corrupted characters like you did.

For instance if the file is encoded UTF-16 and if you read it in UTF-16 mode, you'll be fine even if the file does not contain a single UTF-16 specific character. It all boils down to the file's encoding.

You'll need to do the save - reopen stuff with the same encoding to be safe from corruptions. Otherwise, try to use UTF-7 as much as you can since it is the most compact yet 'email safe' encoding possible, which is why it is default in most .NET framework setups.

like image 56
Teoman Soygul Avatar answered Sep 27 '22 22:09

Teoman Soygul