I'm having problems dealing with Unicode characters from supplementary ("astral") planes in JavaFX. Specifically, I can't paste such characters in a TextInputDialog
(I get some weird characters instead, such as ð
), and can't use them in a WebView (they get rendered as ������
).
The same characters are working perfectly fine if I input them via JOptionPane.showInputDialog
and print them to the console. They even show in a JavaFX Alert
, although it appends some junk at the end.
Is there a way to fix these problems?
I'm using Oracle JDK version 1.8.0_51 in Linux.
Examples of supplementary plane characters: 😀 𐂃 🂡 🙭 𫞂
If you can't see them, you may need to install additional fonts such as Symbola or Noto.
Here's an example program (using a Label
rather than a WebView
):
import javax.swing.JOptionPane;
import javafx.application.Application;
import javafx.scene.Scene;
import javafx.scene.control.Alert;
import javafx.scene.control.Alert.AlertType;
import javafx.scene.control.Label;
import javafx.scene.control.TextInputDialog;
import javafx.scene.layout.StackPane;
import javafx.stage.Stage;
public class UniTest extends Application {
@Override
public void start(final Stage stage) throws Exception {
final String s = new String(new int[]{127137, 178050, 3232, 128512, 241}, 0, 5);
System.out.println("The string: " + s);
System.out.println("Characters: " + s.length());
System.out.println("Code points: " + s.codePoints().count());
JOptionPane.showMessageDialog(null, s, "JOptionPane", JOptionPane.INFORMATION_MESSAGE);
final Alert al = new Alert(AlertType.INFORMATION);
al.setTitle("Alert");
al.setContentText(s);
al.showAndWait();
final TextInputDialog dlg = new TextInputDialog();
dlg.setTitle("TextInputDialog");
dlg.setContentText("Try to paste the string in here");
dlg.showAndWait().ifPresent(x -> System.out.println("Your input: " + x));
final StackPane root = new StackPane();
root.getChildren().add(new Label(s));
stage.setScene(new Scene(root, 400, 300));
stage.setTitle("Stage");
stage.show();
}
public static void main(final String... args) {
launch(args);
}
}
And here are the results I get:
Note: not all the characters in the example are from supplementary planes, and one of the characters is only rendered correctly in the console.
Here is the text you are using.
🂡𫞂ಠ😀ñ
Decimal codepoint representation:
127137 178050 3232 128512 241
Hex representation:
0x1F0A1 0x2B782 0xCA0 0x1F600 0xF1
Java uses UTF-16 internally. So consider the UTF-16 representation:
UTF-16 representation:
D83C DCA1 D86D DF82 0CA0 D83D DE00 00F1
We can see that the display is showing the five characters you expect, but then three garbage characters.
So it is clearly trying to display 8 glyphs, where there are only five. This is almost certainly because the display code is counting 8 characters, because three characters are encoded in UTF-16 as surrogate pairs, so take two 16-bit words each. In other words it is using the wrong value for the length of the string in the presence of surrogate pairs.
UTF-8 Representation of test data:
F0 9F 82 A1 F0 AB 9E 82 E0 B2 A0 F0 9F 98 80 C3 B1
What is seen is
00F0 ð LATIN SMALL LETTER ETH
009F <control> = APC = APPLICATION PROGRAM COMMAND
0082 <control> = BPH = BREAK PERMITTED HERE
00A1 ¡ INVERTED EXCLAMATION MARK
00F0 ð LATIN SMALL LETTER ETH
(The two control characters can have glyphs in some fonts containing either their abbreviations or hex codes. These are visible in your example.)
Latin1 hex representation:
F0 9F 82 A1 F0
Note that these five bytes are the same as the first five bytes of the UTF-8 representation of the intended text.
Conclusion: The pasted data has been pasted as 5 UTF-8 codepoints occupying 17 bytes, but interpreted as 5 Latin1 codepoints occupying 5 bytes. Again, the wrong property has been used for the length.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With