Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write text to file in C# with 513 space characters

Here is a code that writes the string to a file

System.IO.File.WriteAllText("test.txt", "P                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 ");

It's basically the character 'P' followed by a total of 513 space character.

When I open the file in Notepad++, it appears to be fine. However, when I open in windows Notepad, all I see is garbled characters.

If instead of 513 space character, I add 514 or 512, it opens fine in Notepad.

What am I missing?

like image 718
amyn Avatar asked Jan 27 '23 14:01

amyn


1 Answers

What you are missing is that Notepad is guessing, and it is not because your length is specifically 513 spaces ... it is because it is an even number of bytes and the file size is >= 100 total bytes. Try 511 or 515 spaces ... or 99 ... you'll see the same misinterpretation of your file contents. With an odd number of bytes, Notepad can assume that your file is not any of the double-byte encodings, because those would all result in 2 bytes per character = even number of total bytes in the file. If you give the file a few more low-order ASCII characters at the beginning (e.g., "PICKLE" + spaces), Notepad does a much better job of understanding that it should treat the content as single-byte chars.

The suggested approach of including Encoding.UTF8 is the easiest fix ... it will write a BOM to the beginning of the file which tells Notepad (and Notepad++) what the format of the data is, so that it doesn't have to resort to this guessing behavior (you can see the difference between your original approach and the BOM approach by opening both in Notepad++, then look in the bottom-right corner of the app. With the BOM, it will tell you the encoding is UTF-8-BOM ... without it, it will just say UTF-8).

I should also say that the contents of your file are not 'wrong', per se... the weird format is purely due to Notepad's "guessing" algorithm. So unless it's a requirement that people use Notepad to read your file with 1 letter and a large, odd number of spaces ... maybe just don't sweat it. If you do change to writing the file with Encoding.UTF8, then you do need to ensure that any other system that reads your file knows how to honor the BOM, because it is a real change to the contents of your file. If you cannot verify that all consumers of your file can/will handle the BOM, then it may be safer to just understand that Notepad happens to make a bad guess for your specific use case, and leave the raw contents exactly how you want them.

You can verify the physical difference in your file with the BOM by doing a binary read and then converting them to a string (you can't "see" the change with ReadAllText, because it honors & strips the BOM):

byte[] contents = System.IO.File.ReadAllBytes("test.txt");
Console.WriteLine(Encoding.ASCII.GetString(contents));
like image 145
Dusty Avatar answered Jan 31 '23 16:01

Dusty