I'm pretty new, so don't be too harsh :)
I'm facing a problem passing an unicode String
from an embedded javax.swing.JApplet
in a web page to the Java Script part. I'm not sure this is whether a bug or a misunderstanding of the involved technologies:
I want to pass a unicode string from a Java Applet to Java Script, but the String gets messed up. Strangely, the problem doesn't occur not in Internet Explorer 10 but in Chrome (v26) and Firefox (v20). I haven't tested other browsers though.
The returned String seems to be okay, except for the last unicode character. The result in the Java Script Debugger and Web Page would be:
The string seems to get corrupted at the last bytes. If it ends with an ASCII character the string is okay. Additionally the problem doesn't occur within every combination and also not every time (not sure on this). Therefore I suspect a bug and I'm afraid I might be posting an invalid question.
A minimalistic set up includes an applet that returns some unicode (UTF-8) strings:
/* TestApplet.java */
import javax.swing.*;
public class TestApplet extends JApplet {
private String[] testStrings = {
"abc", // OK (because ASCII only)
"表示", // Error on last Character
"表示", // Error on last Character
"ホーム ", // OK (because of *space* after ム)
"アップロード", ... };
public TestApplet() {...}; // Applet specific stuff
...
public int getLength() { return testStrings.length;};
String getTestString(int i) {
return testStrings[i]; // Build-in array functionality because of IE.
}
}
The corresponding web page with java script could look like this:
/* test.html */
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<span id="output"/>
<applet id='output' archive='test.jar' code=testApplet/>
</body>
<script type="text/javascript" charset="utf-8">
var applet = document.getElementById('output');
var node = document.getElementById("1");
for(var i = 0; i < applet.getLength(); i++) {
var text = applet.getTestString(i);
var paragraphNode = document.createElement("p");
paragraphNode.innerHTML = text;
node.appendChild(paragraphNode);
}
</script>
</html>
I'm working on Windows 7 32-Bit with the current Java Version 1.7.0_21 using the "Next Generation Java Plug-in 10.21.2 for Mozilla browsers". I had some problems with my operating system locale, but I tried several (English, Japanese, Chinese) regional settings.
In case of an corrupt String chrome shows invalid characters (e.g. ��). Firefox, on the other hand, drops the string completly, if it would be ending with ��.
Internet explorer manages to display the strings correctly.
I can imagine several workarounds, including escaping/unescaping and adding a "final char" which then is removed via java script. Actually I'm planning to write against Android's Webkit, and I haven't tested it there.
Since I would like to continue testing in Chrome, (because of Webkit technology and comfort) I hope there is a trivial solution to the problem, which I might have overlooked.
If you are testing in Chrome/Firefox
Please replace first line with this and then test it,
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
The Doctype has significant value while browser identifies the page.
Transitional /loose it the types you can use with Unicode. Please test and reply..
I suggest to set a breakpoint on
paragraphNode.innerHTML = text;
and inspect text it in the JavaScript console, e.g. with
console.log(escape(text));
or
console.log(encodeURIComponent(text));
or
for (i=0; i < text.length; i++) {
console.log("i = "+i);
console.log("text.charAt(i) = "+text.charAt(i)
+", text.charCodeAt(i) = "+text.charCodeAt(i));
}
See also
http://www.fileformat.info/info/unicode/char/30a6/index.htm
https://developer.mozilla.org/en-US/docs/DOM/window.escape (which is not part of any standard)
and
https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/encodeURIComponent
or similar resources.
Your source files may not be in the encoding you assume (UTF-8).
JavaScript assumes UTF-16 strings:
http://www.ecma-international.org/ecma-262/5.1/#sec-4.3.16
Java also assumes UTF-16:
http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html
The Linux or Cygwin file
command can show you the encoding of your files.
See
http://linux.die.net/man/1/file (haven't found a kernel.org man reference)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With