Why does my Unicode String get corrupted, when passed from Java Applet to Java Script?

Question

I'm pretty new, so don't be too harsh :)

Question(tl;dr)

I'm facing a problem passing an unicode String from an embedded javax.swing.JApplet in a web page to the Java Script part. I'm not sure this is whether a bug or a misunderstanding of the involved technologies:

Problem

I want to pass a unicode string from a Java Applet to Java Script, but the String gets messed up. Strangely, the problem doesn't occur not in Internet Explorer 10 but in Chrome (v26) and Firefox (v20). I haven't tested other browsers though.

The returned String seems to be okay, except for the last unicode character. The result in the Java Script Debugger and Web Page would be:

abc → abc
表示 → 表��
ま → ま
ウォッチリスト → ウォッチリス��
アップロード → アップロー��
ホ → ��
ホ → ホ (Not deterministic)
アップロードabc → アップロードabc

The string seems to get corrupted at the last bytes. If it ends with an ASCII character the string is okay. Additionally the problem doesn't occur within every combination and also not every time (not sure on this). Therefore I suspect a bug and I'm afraid I might be posting an invalid question.

Test Set Up

A minimalistic set up includes an applet that returns some unicode (UTF-8) strings:

/* TestApplet.java */
import javax.swing.*;

public class TestApplet extends JApplet {

private String[] testStrings = {
            "abc", // OK (because ASCII only)
            "表示", // Error on last Character
            "表示", // Error on last Character
            "ホーム ", // OK (because of *space* after ム)
            "アップロード", ... }; 
    public TestApplet() {...};     // Applet specific stuff

    ...

    public int getLength() { return testStrings.length;};

    String getTestString(int i) {
        return testStrings[i];    // Build-in array functionality because of IE. 
    }
}

The corresponding web page with java script could look like this:

 /* test.html */
<!DOCTYPE html>
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    </head>
    <body>
        <span id="output"/>
        <applet id='output' archive='test.jar' code=testApplet/>
    </body>

    <script type="text/javascript" charset="utf-8">
        var applet = document.getElementById('output');
        var node = document.getElementById("1");
        for(var i = 0; i < applet.getLength(); i++) {
             var text = applet.getTestString(i);
         var paragraphNode = document.createElement("p");
         paragraphNode.innerHTML = text;
         node.appendChild(paragraphNode);
        }
    </script>
</html>

Environment

I'm working on Windows 7 32-Bit with the current Java Version 1.7.0_21 using the "Next Generation Java Plug-in 10.21.2 for Mozilla browsers". I had some problems with my operating system locale, but I tried several (English, Japanese, Chinese) regional settings.

In case of an corrupt String chrome shows invalid characters (e.g. ��). Firefox, on the other hand, drops the string completly, if it would be ending with ��.

Internet explorer manages to display the strings correctly.

Solutions?

I can imagine several workarounds, including escaping/unescaping and adding a "final char" which then is removed via java script. Actually I'm planning to write against Android's Webkit, and I haven't tested it there.

Since I would like to continue testing in Chrome, (because of Webkit technology and comfort) I hope there is a trivial solution to the problem, which I might have overlooked.

MarmiK · Accepted Answer

If you are testing in Chrome/Firefox

Please replace first line with this and then test it,

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

The Doctype has significant value while browser identifies the page.

Transitional /loose it the types you can use with Unicode. Please test and reply..

stackunderflow · Answer

I suggest to set a breakpoint on

paragraphNode.innerHTML = text;

and inspect text it in the JavaScript console, e.g. with

console.log(escape(text));

or

console.log(encodeURIComponent(text));

or

for (i=0; i < text.length; i++) {
    console.log("i = "+i);
    console.log("text.charAt(i) = "+text.charAt(i)
    +", text.charCodeAt(i) = "+text.charCodeAt(i));
}

Why does my Unicode String get corrupted, when passed from Java Applet to Java Script?

Tags:

java

javascript

unicode

utf-8

applet

Question(tl;dr)

Problem

Test Set Up

Environment

Solutions?

Inuniku

2 Answers

MarmiK

stackunderflow

Recent Activity

Donate For Us

Why does my Unicode String get corrupted, when passed from Java Applet to Java Script?

Tags:

java

javascript

unicode

utf-8

applet

Question(tl;dr)

Problem

Test Set Up

Environment

Solutions?

Inuniku

2 Answers

MarmiK

stackunderflow

Related questions

Recent Activity

Donate For Us