Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create Text File Without BOM

I tried this aproach without any success

the code I'm using:

// File name String filename = String.Format("{0:ddMMyyHHmm}", dtFileCreated); String filePath = Path.Combine(Server.MapPath("App_Data"), filename + ".txt");  // Process        myObject pbs = new myObject();          pbs.GenerateFile();  // pbs.GeneratedFile is a StringBuilder object  // Save file Encoding utf8WithoutBom = new UTF8Encoding(true); TextWriter tw = new StreamWriter(filePath, false, utf8WithoutBom); foreach (string s in pbs.GeneratedFile.ToArray())      tw.WriteLine(s); tw.Close();  // Push Generated File into Client Response.Clear(); Response.ContentType = "application/vnd.text"; Response.AppendHeader("Content-Disposition", "attachment; filename=" + filename + ".txt"); Response.TransmitFile(filePath); Response.End(); 

the result:

enter image description here

It's writing the BOM no matter what, and special chars (like Æ Ø Å) are not correct :-/

I'm stuck!

My objective is create a file using UTF-8 as Encoding and 8859-1 as CharSet

Is this so hard to accomplish or I'm just getting a bad day?

All help is greatly appreciated, thank you!

like image 766
balexandre Avatar asked Mar 23 '10 19:03

balexandre


People also ask

Does UTF-8 need BOM?

In the UTF-8 encoding, the presence of the BOM is not essential because, unlike the UTF-16 or UTF-32 encodings, there is no alternative sequence of bytes in a character. The BOM may still occur in UTF-8 encoding text, however, either as a by-product of an encoding conversion or because it was added by an editor.

What is UTF without BOM?

The UTF-8 encoding without a BOM has the property that a document which contains only characters from the US-ASCII range is encoded byte-for-byte the same way as the same document encoded using the US-ASCII encoding. Such a document can be processed and understood when encoded either as UTF-8 or as US-ASCII.

How do I get rid of BOM?

How to remove BOM. If you want to remove the byte order mark from a source code, you need a text editor that offers the option of saving the mark. You read the file with the BOM into the software, then save it again without the BOM and thereby convert the coding. The mark should then no longer appear.


1 Answers

Well it writes the BOM because you are instructing it to, in the line

Encoding utf8WithoutBom = new UTF8Encoding(true); 

true means that the BOM should be emitted, using

Encoding utf8WithoutBom = new UTF8Encoding(false); 

writes no BOM.

My objective is create a file using UTF-8 as Encoding and 8859-1 as CharSet

Sadly, this is not possible, either you write UTF-8 or not. I.e. as long as the characters you are writing are present in ISO Latin-1 it will look like a ISO 8859-1 file, however as soon as you output a character that is not covered by ISO 8859-1 (e.g. ä,ö, ü) these characters will be written as a multibyte character.

To write true ISO-8859-1 use:

Encoding isoLatin1Encoding = Encoding.GetEncoding("ISO-8859-1"); 

Edit: After balexandre's comment

I used the following code for testing ...

var filePath = @"c:\temp\test.txt"; var sb = new StringBuilder(); sb.Append("dsfaskd jlsadfj laskjdflasjdf asdkfjalksjdf lkjdsfljas dddd jflasjdflkjasdlfkjasldfl asääääjdflkaslj d f");  Encoding isoLatin1Encoding = Encoding.GetEncoding("ISO-8859-1");  TextWriter tw = new StreamWriter(filePath, false, isoLatin1Encoding); tw.WriteLine(sb.ToString()); tw.Close(); 

And the file looks perfectly well. Obviously, you should use the same encoding when reading the file.

like image 74
AxelEckenberger Avatar answered Sep 20 '22 00:09

AxelEckenberger