I'm trying to generate a bit pattern(repetitive strings) and export into a text file,
Here's my code:
string pattern_01010101 = "";
for (int i = 0; i < 10; i++)
{
pattern_01010101 += "0,1,0,1,0,1,0,1,";
}
System.IO.File.WriteAllText(@"C:\BField_pattern_01010101.txt", pattern_01010101);
Result:
Now if I change the loop value to "20",
string pattern_01010101 = "";
for (int i = 0; i < 20; i++)
{
pattern_01010101 += "0,1,0,1,0,1,0,1,";
}
System.IO.File.WriteAllText(@"C:\BField_pattern_01010101.txt", pattern_01010101);
Result:
I get this funny little rectangle boxes,
could somebody please suggest me, what am I doing wrong here??
Many thanks for your time..:)
I suspect there is some problem with the default Encoding
used by File.WriteAllText
method.
Try passing encoding you need explicitly and that works fine. for instance Encoding.UTF8
.
File.WriteAllText(@"BField_pattern_01010101.txt", pattern_01010101, Encoding.UTF8);
I've investigated that WriteAllText
also uses "UTF8Encoding" by default. but the only difference is with arguments passed in contructor. Encoding.UTF8 uses new UTF8Encoding(true, false);
where as WriteAllText method uses new UTF8Encoding(false, true);
As noted in comments BOM is the one causing trouble, Thanks @BjörnRoberg. First parameter of "UTF8Encoding constructor" defines whether to emit BOM or not.
Looks like a case of Bush hid the facts. So it's not a bug in your app, it's a bug in notepad. When you write your file, specify the encoding explicitly (UTF-8 or Unicode or something with a byte order mark) to work around it. (By default File.WriteAllText uses UTF-8 without a BOM, so notepad has to guess the encoding, and the guesswork sometimes fails, apparently.)
The problem is not in your application. In fact, if you open Notepad directly, enter 0,1,0,1,0,1,0,1,
20 times, save the file (ANSI encoding), and re-open the file, you will see the same behavior.
By default, the text file will be written in UTF-8 encoding without a Byte Order Mark (BOM). When Notepad opens the file, it first must detect the proper encoding (e.g., Unicode or UTF8) based on only the contents of the text file. This is done based on statistical analysis, using the IsTextUnicode
API. The API notes that:
The IS_TEXT_UNICODE_STATISTICS and IS_TEXT_UNICODE_REVERSE_STATISTICS tests use statistical analysis. These tests are not foolproof. The statistical tests assume certain amounts of variation between low and high bytes in a string, and some ASCII strings can slip through.
In the example of 0,1,0,1,0,1,0,1
repeated 20 times, the IsTextUnicode
function incorrectly indicated the text was Unicode-encoded rather than UTF-8 encoded. (This type false positive is perhaps most infamously present in this bug.)
As evidence, the following :
[DllImport("Advapi32", SetLastError = false)]
static extern bool IsTextUnicode(byte[] buf, int len, ref int opt);
...
int iter = 20;
string test = string test = String.Join("", Enumerable.Repeat("0,1,0,1,0,1,0,1,", iter));
var bytes = UTF8Encoding.UTF8.GetBytes(test);
int opt = 0x20; // IS_TEXT_UNICODE_STATISTICS;
Console.WriteLine(IsTextUnicode(bytes, bytes.Length, ref opt));
If iter > 10
(e.g., for more than 10 repetitions), the encoding will be interpreted, incorrectly, as Unicode.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With