Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dealing with unicode �, how to get rid of? Android/java

I am using a terminal emulator library to create a terminal and then I use it to send the data entered over serial to a serial device. The library can be seen here.

When I enter data into the terminal a strange series of characters is being sent/received. I think the unicode replacement character gets sent over serial, the serial device doesnt know what it is and returns ~0.

Screenshot of what appears in the terminal when i write "test": enter image description here

And the log showing the strings sent and the data received. http://i.imgur.com/x79aPzv.png

I create an EmulatorView, it's the terminal view. it mentions the diamonds here.

private void sendText(CharSequence text) {
                int n = text.length();
                char c;
                try {
                    for(int i = 0; i < n; i++) {
                        c = text.charAt(i);
                        if (Character.isHighSurrogate(c)) {
                            int codePoint;
                            if (++i < n) {
                                codePoint = Character.toCodePoint(c, text.charAt(i));
                            } else {
                                // Unicode Replacement Glyph, aka white question mark in black diamond.
                                codePoint = '\ufffd';
                            }
                            mapAndSend(codePoint);
                        } else {
                            mapAndSend(c);
                        }
                    }
                } catch (IOException e) {
                    Log.e(TAG, "error writing ", e);
                }
            }

Is there any way to fix this? Can anybody see in the library class why this is happening?, How can I refer to � in java to even parse it out if I wanted to? I can't say if (!str.contains("�") I take it.

When I type in the terminal this is run:

public void write(byte[] bytes, int offset, int count) {


 String str;
try {
    str = new String(bytes, "UTF-8");
      Log.d(TAG, "data received in write: " +str );

      GraphicsTerminalActivity.sendOverSerial(str.getBytes("UTF-8"));
} catch (UnsupportedEncodingException e) {
    Log.d(TAG, "exception" );
    e.printStackTrace();
}

        // appendToEmulator(bytes, 0, bytes.length);

 return;
}

This is what I call to send data. sendData(Byte[] data) is a library method.

public static void sendOverSerial(byte[] data) {
        String str;
        try {
            str = new String(data,"UTF-8");
             if(mSelectedAdapter !=null && data !=null){
                 Log.d(TAG, "send over serial string==== " + str);

                mSelectedAdapter.sendData(str.getBytes("UTF-8"));
                 }
        } catch (UnsupportedEncodingException e) {
            Log.d(TAG, "exception");
            e.printStackTrace();
        }

    }

Once data is sent the reply is received here:

public void onDataReceived(int id, byte[] data) {

        try {
            dataReceived = new String(data, "UTF-8");
        } catch (UnsupportedEncodingException e) {
            Log.d(TAG, "exception");
            e.printStackTrace();
        }

        try {
            dataReceivedByte = dataReceived.getBytes("UTF-8");
        } catch (UnsupportedEncodingException e) {
            Log.d(TAG, "exception");
            e.printStackTrace();
        }
        statusBool = true;
        Log.d(TAG, "in data received " + dataReceived);
        ((MyBAIsWrapper) bis).renew(data);


        runOnUiThread(new Runnable(){

            @Override
            public void run() {

                mSession.appendToEmulator(dataReceivedByte, 0, dataReceivedByte.length);

            }});

    viewHandler.post(updateView);

}

Relevant section of library class where characters are written:

Relevant section of class:

private void sendText(CharSequence text) {
                int n = text.length();
                char c;
                try {
                    for(int i = 0; i < n; i++) {
                        c = text.charAt(i);
                        if (Character.isHighSurrogate(c)) {
                            int codePoint;
                            if (++i < n) {
                                codePoint = Character.toCodePoint(c, text.charAt(i));
                            } else {
                                // Unicode Replacement Glyph, aka white question mark in black diamond.
                                codePoint = '\ufffd';
                            }
                            mapAndSend(codePoint);
                        } else {
                            mapAndSend(c);
                        }
                    }
                } catch (IOException e) {
                    Log.e(TAG, "error writing ", e);
                }
            }

            private void mapAndSend(int c) throws IOException {
                int result = mKeyListener.mapControlChar(c);
                if (result < TermKeyListener.KEYCODE_OFFSET) {
                    mTermSession.write(result);
                } else {
                    mKeyListener.handleKeyCode(result - TermKeyListener.KEYCODE_OFFSET, getKeypadApplicationMode());
                }
                clearSpecialKeyStatus();
            }
like image 792
Paul Avatar asked Jan 25 '13 16:01

Paul


Video Answer


1 Answers

Java stores text internally as unencoded Unicode. Used to be 16 bits, now I'm guessing it's 32 based on the fact that you're getting four characters of output on your terminal for every unicode character you're trying to output.

What you probably want to do is use something like string.getBytes("ASCII") to convert your unicode string into straight single-byte ascii. If your terminal emulator handles other character sets (like Latin-1), use that instead of "ASCII".

Then, transmit the bytes to your terminal emulator instead of the string.

Notes: I'm not positive that "ASCII" is the exact name of the character set; you'll want to research that yourself. Also, I don't know what getBytes() will do with unicode characters that can't be translated to ascii, so you'll want to research that too.

ETA: I'm having trouble following your code logic from the scraps you posted. Who calls write(), where did the data come from, and where does it go? Same questions applies to sendOverSerial() and onDataReceived().

In any event, I'm almost dead certain that somewhere, raw 32-bit Unicode data was converted to bytes without being encoded. From that point forward, either sending it as-is or re-encoding it as UTF-8 would produce the effect you're seeing. I don't see how this could have happened in any of the code you posted, so I'm guessing it happened elsewhere before any of the functions you showed us are being called.

like image 103
Edward Falk Avatar answered Oct 11 '22 09:10

Edward Falk