Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate repetitive bit pattern (strings) & export into text file?

I'm trying to generate a bit pattern(repetitive strings) and export into a text file,

Here's my code:

string pattern_01010101 = "";
        
for (int i = 0; i < 10; i++)
{
     pattern_01010101 += "0,1,0,1,0,1,0,1,";
}

System.IO.File.WriteAllText(@"C:\BField_pattern_01010101.txt", pattern_01010101);

Result:

enter image description here


Now if I change the loop value to "20",

string pattern_01010101 = "";
        
for (int i = 0; i < 20; i++)
{
     pattern_01010101 += "0,1,0,1,0,1,0,1,";
}

System.IO.File.WriteAllText(@"C:\BField_pattern_01010101.txt", pattern_01010101);

Result:

I get this funny little rectangle boxes,

enter image description here

could somebody please suggest me, what am I doing wrong here??

Many thanks for your time..:)

like image 849
SanVEE Avatar asked Nov 18 '13 11:11

SanVEE


3 Answers

I suspect there is some problem with the default Encoding used by File.WriteAllText method.

Try passing encoding you need explicitly and that works fine. for instance Encoding.UTF8.

File.WriteAllText(@"BField_pattern_01010101.txt", pattern_01010101, Encoding.UTF8);

I've investigated that WriteAllText also uses "UTF8Encoding" by default. but the only difference is with arguments passed in contructor. Encoding.UTF8 uses new UTF8Encoding(true, false); where as WriteAllText method uses new UTF8Encoding(false, true);

As noted in comments BOM is the one causing trouble, Thanks @BjörnRoberg. First parameter of "UTF8Encoding constructor" defines whether to emit BOM or not.

like image 149
Sriram Sakthivel Avatar answered Oct 17 '22 03:10

Sriram Sakthivel


Looks like a case of Bush hid the facts. So it's not a bug in your app, it's a bug in notepad. When you write your file, specify the encoding explicitly (UTF-8 or Unicode or something with a byte order mark) to work around it. (By default File.WriteAllText uses UTF-8 without a BOM, so notepad has to guess the encoding, and the guesswork sometimes fails, apparently.)

like image 28
fejesjoco Avatar answered Oct 17 '22 05:10

fejesjoco


The problem is not in your application. In fact, if you open Notepad directly, enter 0,1,0,1,0,1,0,1, 20 times, save the file (ANSI encoding), and re-open the file, you will see the same behavior.

By default, the text file will be written in UTF-8 encoding without a Byte Order Mark (BOM). When Notepad opens the file, it first must detect the proper encoding (e.g., Unicode or UTF8) based on only the contents of the text file. This is done based on statistical analysis, using the IsTextUnicode API. The API notes that:

The IS_TEXT_UNICODE_STATISTICS and IS_TEXT_UNICODE_REVERSE_STATISTICS tests use statistical analysis. These tests are not foolproof. The statistical tests assume certain amounts of variation between low and high bytes in a string, and some ASCII strings can slip through.

In the example of 0,1,0,1,0,1,0,1 repeated 20 times, the IsTextUnicode function incorrectly indicated the text was Unicode-encoded rather than UTF-8 encoded. (This type false positive is perhaps most infamously present in this bug.)

As evidence, the following :

[DllImport("Advapi32", SetLastError = false)]
static extern bool IsTextUnicode(byte[] buf, int len, ref int opt);

...
int iter = 20;
string test = string test = String.Join("", Enumerable.Repeat("0,1,0,1,0,1,0,1,", iter));
var bytes = UTF8Encoding.UTF8.GetBytes(test);
int opt = 0x20; // IS_TEXT_UNICODE_STATISTICS;
Console.WriteLine(IsTextUnicode(bytes, bytes.Length, ref opt));

If iter > 10 (e.g., for more than 10 repetitions), the encoding will be interpreted, incorrectly, as Unicode.

like image 37
drf Avatar answered Oct 17 '22 04:10

drf