Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make sure my created filedownload is UTF-8? (and not UTF-8 without BOM)

i've made a download function to download messages to a CSV file (code is below). Now when i open it up in notepad or notepad++ i see this:

é NY ø ╬ ║► ░ ê ö

(and that is what is in the database btw)

Now, when i open it up in Ms-Excel it shows this:

é NY ø ╬ ║► ░ ê ö

When i open it up in notepad++, it says it's encoded in 'UTF8 without BOM'. When i encode it (in notepad++) to UTF-8, all goes well (that is, Excel shows the right chars too)

But how can i make sure that the file i create from my code is UTF-8?

This is my code:

public ActionResult DownloadPersonalMessages()
{    
    StringBuilder myCsv = new StringBuilder();
    myCsv.Append(new DownloadService().GetPersonalMessages());

    this.Response.ContentType = "text/csv";
    Response.AddHeader("content-disposition", "attachment; filename=PersonalMessages.csv");
    Response.ContentEncoding = Encoding.UTF8;
    Response.Write(myCsv.ToString());
    Response.Flush();
    Response.HeaderEncoding = Encoding.UTF8;
    return Content("");
}

Edit:

my function now returns a ByteArray with this conversion

UTF8Encoding encoding = new UTF8Encoding();
return encoding.GetBytes(str);

and my download is now this:

Response.AddHeader("Content-Disposition", "attachment; filename=PersonalMessages.csv");
return File(new DownloadService().GetPersonalMessages(), "text/csv");
like image 677
Michel Avatar asked Nov 26 '10 08:11

Michel


People also ask

How do I change my encoding to UTF-8?

UTF-8 Encoding in Notepad (Windows) Click File in the top-left corner of your screen. In the dialog which appears, select the following options: In the "Save as type" drop-down, select All Files. In the "Encoding" drop-down, select UTF-8.

What is UTF-8 without BOM?

The UTF-8 encoding without a BOM has the property that a document which contains only characters from the US-ASCII range is encoded byte-for-byte the same way as the same document encoded using the US-ASCII encoding. Such a document can be processed and understood when encoded either as UTF-8 or as US-ASCII.

What is the difference between UTF-8 and UTF-8 without BOM?

There is no official difference between UTF-8 and BOM-ed UTF-8. A BOM-ed UTF-8 string will start with the three following bytes. EF BB BF. Those bytes, if present, must be ignored when extracting the string from the file/stream.

How do I know if I have UTF-8?

Open the file in Notepad. Click 'Save As...'. In the 'Encoding:' combo box you will see the current file format. Yes, I opened the file in notepad and selected the UTF-8 format and saved it.

How do I convert a file to UTF-8 without a BOM?

Open the file you want to verify/fix in Notepad++ In the top menu select Encoding > Convert to UTF-8 (option without BOM) That's it, you should now have a valid file in UTF-8 encoding without the byte order mark.

How can I tell if a file is UTF-8?

If you get no error, the file is extremely likely to be UTF-8. That's because UTF-8 has properties that make it very hard to mistake typical text in any other commonly used character encoding for valid UTF-8. The first command returned 0, and the second command didn't return an error, so we can say it's UTF-8.

How do I convert a text file to UTF-8 for free?

Download and install this powerful free text editor: Notepad++. Open the file you want to verify/fix in Notepad++. In the top menu select Encoding > Convert to UTF-8 (option without BOM) Save the file.

How do I convert to UTF-8 without the byte order mark?

In the top menu select Encoding > Convert to UTF-8 (option without BOM) That's it, you should now have a valid file in UTF-8 encoding without the byte order mark. Was this article helpful? Yes | No


2 Answers

Zareth's answer helped the OP, but it didn't actually answer the question. Here's the correct solution, from this other post:

public ActionResult Download()
{
    var data = Encoding.UTF8.GetBytes("some data");
    var result = Encoding.UTF8.GetPreamble().Concat(data).ToArray();
    return File(result, "application/csv", "foo.csv");
}

The byte-order mark (while not technically required for UTF8) clues certain programs (e.g. Excel >2007) in to the fact that you're using UTF8. You have to manually include it via the GetPreamble() method.

like image 175
StriplingWarrior Avatar answered Oct 16 '22 14:10

StriplingWarrior


You might want to try using the UTF8Encoding class. The constructor has a parameter that determines if it should provide the BOM or not. You'll probably have to use the GetBytes-method and write the string as a series of bytes in the response, and not convert it back into a .net string object.

like image 21
Zareth Avatar answered Oct 16 '22 14:10

Zareth