There doesn't seem to be an accepted way of sending down a header parameter in non ascii format.
The header for file download usually looks like
Content-disposition: attachment; filename="theasciifilename.doc"
Except if you smash a utf8 encoded string in the filename parameter, Firefox will handle it fine, whereas IE will throw up.
There is a document on CodeProject that explains a method for encoding the filename.
This document encodes Bản Kiểm Kê.doc to B%e1%ba%a3n%20Ki%e1%bb%83m%20K%c3%aa.doc by hex encoding the bytes.
Problem #1: the first character in that string: ả has a value of ả -- encode that number in Hex and you get %a3%1e. How did this guy get %e1%ba%a3? (I'm obviously missing something simple here)
Problem #2: While IE acknowledges this encoding, Firefox doesn't! What to do?
The specs basically don't permit anything other than US-ASCII. HTTP headers are US-ASCII. HTTP's payload defaults to ISO 8859-1 but that refers to the content body, not the headers.
Arguably the Right Thing to do would be to use MIME's technique for encoding non-ASCII data in headers, as described in RFC 2047, but I have no idea whether browsers actually support that.
EDIT: Whoops, no, RFC 2047 section 5 explicitly says that the encoded form is not permitted in Content-Disposition. Looks like you're out of luck - there is no standard.
EDIT 2: There is a standard - RFC 2231 defines how this is now supposed to work. It has support from some browsers, but is not supported in IE. I found some test cases which demonstrate how it works and what browser support is available.
Answer to question #1: You are confusing Unicode and UTF-8. The hex value of 'ả' is 0xA31E however that is not a UTF-8 character. In UTF-8 that character requries three bytes, 0xE1 0xBA 0xA3
. URL encoding is poorly defined for non-ascii encodings but %e1%ba%a3 is the valid UTF-8 encoding to use for that character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With