Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to display a non-ascii filename in the file download box in browsers?

Tags:

encoding

utf-8

There doesn't seem to be an accepted way of sending down a header parameter in non ascii format.

The header for file download usually looks like

Content-disposition: attachment; filename="theasciifilename.doc"

Except if you smash a utf8 encoded string in the filename parameter, Firefox will handle it fine, whereas IE will throw up.

There is a document on CodeProject that explains a method for encoding the filename.

This document encodes Bản Kiểm Kê.doc to B%e1%ba%a3n%20Ki%e1%bb%83m%20K%c3%aa.doc by hex encoding the bytes.

Problem #1: the first character in that string: ả has a value of ả -- encode that number in Hex and you get %a3%1e. How did this guy get %e1%ba%a3? (I'm obviously missing something simple here)

Problem #2: While IE acknowledges this encoding, Firefox doesn't! What to do?

like image 938
Michael Pryor Avatar asked Sep 29 '08 15:09

Michael Pryor


2 Answers

The specs basically don't permit anything other than US-ASCII. HTTP headers are US-ASCII. HTTP's payload defaults to ISO 8859-1 but that refers to the content body, not the headers.

Arguably the Right Thing to do would be to use MIME's technique for encoding non-ASCII data in headers, as described in RFC 2047, but I have no idea whether browsers actually support that.

EDIT: Whoops, no, RFC 2047 section 5 explicitly says that the encoded form is not permitted in Content-Disposition. Looks like you're out of luck - there is no standard.

EDIT 2: There is a standard - RFC 2231 defines how this is now supposed to work. It has support from some browsers, but is not supported in IE. I found some test cases which demonstrate how it works and what browser support is available.

like image 72
Mike Dimmick Avatar answered Oct 28 '22 20:10

Mike Dimmick


Answer to question #1: You are confusing Unicode and UTF-8. The hex value of 'ả' is 0xA31E however that is not a UTF-8 character. In UTF-8 that character requries three bytes, 0xE1 0xBA 0xA3. URL encoding is poorly defined for non-ascii encodings but %e1%ba%a3 is the valid UTF-8 encoding to use for that character.

like image 25
Mr. Shiny and New 安宇 Avatar answered Oct 28 '22 20:10

Mr. Shiny and New 安宇