Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to encode Unicode so both iPad and Excel can understand?

Tags:

c#

asp.net

ipad

I have a CSV that is encoded with UTF32. When I open stream in IE and open with Excel I can read everything. On iPad I stream and I get a blank page with no content whatsoever. (I don't know how to view source on iPad so there could be something hidden in HTML).

The http response is written in asp.net C#

Response.Clear();
Response.Buffer = true;

Response.ContentType = "text/comma-separated-values";
Response.AddHeader("Content-Disposition", "attachment;filename=\"InventoryCount.csv\"");

Response.RedirectLocation = "InventoryCount.csv";
Response.ContentEncoding = Encoding.UTF32;//works on Excel wrong in iPad
//Response.ContentEncoding = Encoding.UTF8;//works on iPad wrong in Excel

Response.Charset = "UTF-8";//tried also adding Charset just to see if it works somehow, but it does not.
EnableViewState = false;

NMDUtilities.Export oUtilities = new NMDUtilities.Export();

Response.Write(oUtilities.DataGridToCSV(gvExport, ","));

Response.End();

The only guess I can make is that iPad cannot read UTF32, is that true? How can I view source on iPad?


UPDATE
I just made an interesting discovery. When my encoding is UTF8 things work on iPad and characters are displayed properly, but Excel messes up a character. But when I use UTF32 the inverse is true. iPad displays nothing, but Excel works perfectly. I really have no idea what I can do about this.

iPad UTF8 outputs = " Quattrode® "
Excel UTF8 outputs = " Quattrode® "

iPad UTF32 outputs = " "
Excel UTF32 outputs = " Quattrode® "

Here's my implementation of DataGridToCsv

public string DataGridToCsv(GridView input, string delimiter)
{
    StringBuilder sb = new StringBuilder();

//iterate Gridview and put row results in stringbuilder...
   string result = HttpUtility.HtmlDecode(sb.ToString());
   return result;
}


UPDATE2 Excel is barfing on UTF8 >:{. Man. I just undid the second option he lists because it doesnt work on iPad. I cant win for losing on this one.

UPDATE3
Per your suggestions I have looked at the hex code. There is no BOM, but there is a difference between the file layouts.

UTF8
4D 61 74 65 (MATE from the first word MATERIAL)
UTF32
4D 00 00 00 (M from the first word MATERIAL)

So it looks like UTF32 lays things out in 32 bits vs UTF8 doing it in 8 bits. I think this is why Excel can guess. Now I will try your suggested fixes.

like image 621
P.Brian.Mackey Avatar asked Jan 20 '23 09:01

P.Brian.Mackey


2 Answers

The problem is that the browser knows your data's encoding is UTF-8, but it has no way of telling Excel. When Excel opens the file, it assumes your system's default encoding. If you copy some non-ASCII text, paste it in Notepad, and save it with UTF-8 encoding, though, you'll see that Excel can properly detect it. It works on the iPad because its default encoding just happens to be UTF-8.

The reason is that Notepad puts the proper byte order mark (EF BB BF for UTF-8) in the beginning of the file. You can try it yourself by using a hex editor or some other means to create a file containing

EF BB BF 20 51 75 61 74 74 72 6F 64 65 C2 AE 20

and opening that file in Excel. (I used Excel 2010, but I assume it would work with all recent versions.)

Try making sure your output starts with those first 3 bytes.


How to write a BOM in C#
    byte[] BOM = new byte[] { 0xef, 0xbb, 0xbf };
    Response.BinaryWrite(BOM);//write the BOM first
    Response.Write(utility.DataGridToCSV(gvExport, ","));//then write your CSV
like image 67
Gabe Avatar answered Jan 22 '23 00:01

Gabe


Excel tries to infer the encoding based on your file contents, and ASCII and UTF-8 happen to overlap on the first 128 characters (letters and numbers). When you use UTF-16 and UTF-32, it can figure out that the content isn't ASCII, but since most of your content using UTF-8 matches ASCII, if you want your file to be read in as UTF-8, you have to tell it explicitly that the content is UTF-8 by writing the byte order mark as Gabe said in his answer. Also, see the answer by Andrew Csontos on this other question:

What's the best way to export UTF8 data into Excel?

like image 25
Joel C Avatar answered Jan 22 '23 00:01

Joel C