URL Escaping Chinese/Japanese Unicode Characters for Internet Explorer

Question

I'm trying to URL-escape (percent-encode) non-ascii characters in several URLs I'm dealing with. I'm working with a flash application that loads resources like images and sound clips from these URLs. Since the filenames can contain non-ascii characters, like so: 日本語.jpg I escape them by utf-8 encoding the characters, and then percent-escaping the unicode bytes, to get the following:

%E6%97%A5%E6%9C%AC%E8%AA%9E.jpg

These filenames work fine when I run the app in any browser other than Internet Explorer - I've tried Firefox, Safari and Chrome. But when I launch the app in IE (tried both 6 and 8) and it tries to load the sound clip, I get: Error #2044: Unhandled ioError, and the URL has been corrupted to something like:

æ¥æ¬èª.jpg

Any thoughts on how to fix this? This is just test-driving the flash app with local filesystem URLs. I've also noticed that Internet explorer isn't able to locate a file such as: file:///C:/%E6%97%A5%E6%9C%AC%E8%AA%9E.jpg, though Chrome / Firefox will decode it and load just fine for a file with the path

C:\日本語.jpg

edit

I think my problem is the same as the one encountered in the following ActionScript code fragment:

import flash.display.Loader;
import flash.net.URLRequest;
...
var ldr:Loader;
var req:URLRequest = new URLRequest("日本語.jpg");
ldr = new Loader();
ldr.load(req);

Using the string 日本語.jpg will work in IE, while using the string %E6%97%A5%E6%9C%AC%E8%AA%9E.jpg works in other browsers. What I need is a single form that will work in all browsers. I have tried the %u encoding and setting the http request header to Content-Type: text/html; charset=utf-8 with no luck in either percent-escaped or unescaped form.

JasonTrue · Accepted Answer

IE uses UTF-8 for HTTP Urls, but I'm not sure about File URLs (even though I tested the behavior as part of the IE team about 10 years ago). If you are using the URLS in HTML, I'd actually recommend trying string literals (if your page encoding is UTF-8) or Numeric Character References (&#dddd;). IE will generally convert the characters into an appropriate encoding, which would be UTF-8 for the HTTP stuff, and UTF-16 for local file system interactions.

It's actually HTTP that needs the URL-escaping, not the HTML parser.

Dave Mateer · Answer

Sorry, no solution, but maybe at least some more information about what might be going on here. (Probably you've already figured this much out, but maybe it will help another reader find a solution.) The "official" url encoding specification seems to leave the door wide open as to how to decode escaped urls like the ones you are generating--are the escaped entities intended to represent UTF-8 characters (as Firefox, etc. are interpretting them) or ASCII characters (as IE is interpretting them)? I don't know of any way to force the intended decoding strategy.

Just a question: what bad thing is happening if you do not escape them at all, but leave the unicode in the url? Although I don't have a lot of experience with it, I thought I remember reading somewhere that the days of needing to escape unicode in urls are behind us. Could be wrong about that...

Bear · Answer

Try encoding only the parts of the URI that would cause it to be parsed incorrectly. For instance, encode &, ?, and space. Leave everything else as is, and it should work like a charm.

If you are still running into problems, You may need to set the content-type to utf in your http headers. Something like Content-type: text/html; charset=UTF-8.

URL Escaping Chinese/Japanese Unicode Characters for Internet Explorer

Tags:

url

encode

escaping

internet-explorer

unicode

Bear

3 Answers

JasonTrue

Dave Mateer

Bear

Recent Activity

Donate For Us

URL Escaping Chinese/Japanese Unicode Characters for Internet Explorer

Tags:

url

encode

escaping

internet-explorer

unicode

Bear

3 Answers

JasonTrue

Dave Mateer

Bear

Related questions

Recent Activity

Donate For Us