Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Eclipse detail formatter string not displaying all Unicode characters

Tags:

java

eclipse

jdb

I like to see the clipboard symbol: đź“‹ (U+1F4CB) in the debugger.

I understand the two codepoints. enter image description here

Whearat:

  • \ud83d is ß“
  • \u8dccb is

I like to detail-format to see it in the debug-tooltip in Unicode.

My current detail-formatter(Preferences->Java-Debug->Detail Formatter) is:

new String(this.getBytes("utf8"), java.nio.charset.Charset.forName("utf8")).concat(" <---")

(the code above does simply nothing than add a <--- to the detail-view)

Question 1:

What formatter do I need to see the character displayed correctly in the yellow tooltip?

Source

import java.nio.charset.Charset;

public class Test {
    public static void main(String[] args) {
        byte[] db = new byte[] { -16, -97, -109, -117 };
        String x = new String(db, Charset.forName("utf8"));
        System.out.println(x);
        return;
    }
}
like image 840
Grim Avatar asked Jun 23 '18 09:06

Grim


1 Answers

The “📋” character has been defined within the Unicode character set and since String instances are sequences of Unicode characters, they may contain that character. But it lies outside the Basic Multilingual Plane, so software processing it has to handle it with more care. Most notably, it must not try to process it as individual char values, which are UTF-16 units, requiring processing such a character as pair of surrogate characters.

Your detail formatter specified as

new String(this.getBytes("utf8"), java.nio.charset.Charset.forName("utf8")) …

doesn’t help here, as this.getBytes("utf8") converts the Unicode String instance to a byte[] array in the UTF-8 encoding, which is then passed to the new String(…, Charset.forName("utf8")) constructor, converting the byte array back to an identical String instance. If Eclipse’s debugger failed to render the original string, it won’t suddenly do it correctly with an identical string after that redundant operation.

Generally, if Eclipse’s debugger is incapable of correctly rendering strings containing characters outside the Basic Multilingual Plane, there is nothing you can do in a Detail Formatter to fix that, as all processing you will do there, will eventually end up in a String, perhaps after applying a chain of Detail Formatters. So the end result can only be one of two choices, a String with the problematic character removed or a String which Eclipse’s debugger can’t render correctly.

In other words, this is a bug that can only get fixed on Eclipse’s side.

like image 186
Holger Avatar answered Nov 03 '22 04:11

Holger