Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Print chess symbols using UnicodeBlock?

With jdk12, came Chess symbols (source):

Unicode 11.0.0 introduced the following new features that are now included in JDK 12

[...] 4 blocks for the following existing scripts:

  • Georgian Extended

  • Mayan Numerals

  • ndic Siyaq Numbers

  • Chess Symbols

With that in mind, I tried to print those characters with the following code, to test the functionality and use those later in a little chess game:

Character.UnicodeBlock block = Character.UnicodeBlock.CHESS_SYMBOLS;
for (int i = 0; i < 1114112; i++) {
    char unicode = (char) i;
    if(Character.UnicodeBlock.of(unicode) == block) {
        System.out.println(unicode);
    }
}

However, it is not printing anything. The code works if I replace CHESS_SYMBOLS with, for instance, ARABIC. I have java 12.0.1.

Question: Why isn't the above code printing anything ?

like image 843
Paul Lemarchand Avatar asked May 14 '19 12:05

Paul Lemarchand


2 Answers

Some chess symbol characters exist in the Miscellaneous Symbols block, but you are specifically checking for 16-bit char values in a different block. The Chess Symbols block contains zero characters with 16-bit values; it starts at U+1FA00, and ends at U+1FA6F.

By casting to char, you are trimming all values above U+FFFF to their lowest 16 bits; for example, if i is 0x1fa60, casting it to a char will make it 0xfa60, which prevents your block check from succeeding.

To make your code work, you need to stop assuming that all codepoints are 16-bit values. You can do that by changing this:

char unicode = (char) i;

to this:

int unicode = i;
like image 138
VGR Avatar answered Oct 25 '22 07:10

VGR


Unfortunately Character.UnicodeBlock doesn't have methods to tell what is the beginning and ending value for code points within the block. In Unicode 11 the chess symbols block runs from U+1FA00 to U+1FA6D.

Java uses UTF-16 and surrogate pairs to represent characters over U+10000. In this case code point U+1FA00 will be represented as two char values: U+D83E (high surrogate) and U+DE60 (low surrogate).

You should use Character.toChars() to correctly print the code point which is always an int:

Character.UnicodeBlock block = Character.UnicodeBlock.CHESS_SYMBOLS;
for (int i = 0; i < 1114112; i++) {
    if (Character.UnicodeBlock.of(i).equals(block)) {
        System.out.println(Character.toChars(i));
    }
}
like image 37
Karol Dowbecki Avatar answered Oct 25 '22 06:10

Karol Dowbecki