I tried this aproach without any success
the code I'm using:
// File name String filename = String.Format("{0:ddMMyyHHmm}", dtFileCreated); String filePath = Path.Combine(Server.MapPath("App_Data"), filename + ".txt"); // Process myObject pbs = new myObject(); pbs.GenerateFile(); // pbs.GeneratedFile is a StringBuilder object // Save file Encoding utf8WithoutBom = new UTF8Encoding(true); TextWriter tw = new StreamWriter(filePath, false, utf8WithoutBom); foreach (string s in pbs.GeneratedFile.ToArray()) tw.WriteLine(s); tw.Close(); // Push Generated File into Client Response.Clear(); Response.ContentType = "application/vnd.text"; Response.AppendHeader("Content-Disposition", "attachment; filename=" + filename + ".txt"); Response.TransmitFile(filePath); Response.End();
the result:
It's writing the BOM no matter what, and special chars (like Æ Ø Å) are not correct :-/
I'm stuck!
My objective is create a file using UTF-8 as Encoding and 8859-1 as CharSet
Is this so hard to accomplish or I'm just getting a bad day?
All help is greatly appreciated, thank you!
In the UTF-8 encoding, the presence of the BOM is not essential because, unlike the UTF-16 or UTF-32 encodings, there is no alternative sequence of bytes in a character. The BOM may still occur in UTF-8 encoding text, however, either as a by-product of an encoding conversion or because it was added by an editor.
The UTF-8 encoding without a BOM has the property that a document which contains only characters from the US-ASCII range is encoded byte-for-byte the same way as the same document encoded using the US-ASCII encoding. Such a document can be processed and understood when encoded either as UTF-8 or as US-ASCII.
How to remove BOM. If you want to remove the byte order mark from a source code, you need a text editor that offers the option of saving the mark. You read the file with the BOM into the software, then save it again without the BOM and thereby convert the coding. The mark should then no longer appear.
Well it writes the BOM because you are instructing it to, in the line
Encoding utf8WithoutBom = new UTF8Encoding(true);
true
means that the BOM should be emitted, using
Encoding utf8WithoutBom = new UTF8Encoding(false);
writes no BOM.
My objective is create a file using UTF-8 as Encoding and 8859-1 as CharSet
Sadly, this is not possible, either you write UTF-8 or not. I.e. as long as the characters you are writing are present in ISO Latin-1 it will look like a ISO 8859-1 file, however as soon as you output a character that is not covered by ISO 8859-1 (e.g. ä,ö, ü) these characters will be written as a multibyte character.
To write true ISO-8859-1 use:
Encoding isoLatin1Encoding = Encoding.GetEncoding("ISO-8859-1");
Edit: After balexandre's comment
I used the following code for testing ...
var filePath = @"c:\temp\test.txt"; var sb = new StringBuilder(); sb.Append("dsfaskd jlsadfj laskjdflasjdf asdkfjalksjdf lkjdsfljas dddd jflasjdflkjasdlfkjasldfl asääääjdflkaslj d f"); Encoding isoLatin1Encoding = Encoding.GetEncoding("ISO-8859-1"); TextWriter tw = new StreamWriter(filePath, false, isoLatin1Encoding); tw.WriteLine(sb.ToString()); tw.Close();
And the file looks perfectly well. Obviously, you should use the same encoding when reading the file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With