Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is File.ReadAllBytes result different than when using File.ReadAllText?

Tags:

string

c#

byte

I have a text file (UTF-8 encoding) with contents "test". I try to get the byte array from this file and convert to string, but it contains one strange character. I use the following code:

var path = @"C:\Users\Tester\Desktop\test\test.txt"; // UTF-8

var bytes = File.ReadAllBytes(path);
var contents1 = Encoding.UTF8.GetString(bytes);

var contents2 = File.ReadAllText(path);

Console.WriteLine(contents1); // result is "?test"
Console.WriteLine(contents2); // result is "test"

conents1 is different than contents2 - why?

like image 404
Dragon Avatar asked Sep 29 '14 14:09

Dragon


People also ask

What is the difference between ReadAllLines and ReadAllText in C#?

ReadAllLines returns an array of strings. Each string contains a single line of the file. ReadAllText returns a single string containing all the lines of the file.

How to Read the bytes from a file c#?

ReadAllBytes() Method in C# with Examples. File. ReadAllBytes(String) is an inbuilt File class method that is used to open a specified or created binary file and then reads the contents of the file into a byte array and then closes the file.


3 Answers

As explained in ReadAllText's documentation:

This method attempts to automatically detect the encoding of a file based on the presence of byte order marks. Encoding formats UTF-8 and UTF-32 (both big-endian and little-endian) can be detected.

So the file contains BOM (Byte order mark), and ReadAllText method correctly interprets it, while the first method just reads plain bytes, without interpreting them at all.

Encoding.GetString says that it only:

decodes all the bytes in the specified byte array into a string

(emphasis mine). Which is of course not entirely conclusive, but your example shows that this is to be taken literally.

like image 108
BartoszKP Avatar answered Oct 27 '22 01:10

BartoszKP


You are probably seeing the Unicode BOM (byte order mark) at the beginning of the file. File.ReadAllText knows how to strip this off, but Encoding.UTF8 does not.

like image 39
recursive Avatar answered Oct 27 '22 01:10

recursive


It's the UTF8 encoding prefix string. It marks the file as UTF8 encoded. ReadAllText doesn't return it because it's a parsing instruction.

like image 43
PhillipH Avatar answered Oct 27 '22 00:10

PhillipH