Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Initializing wonky characters in Java

I'm trying to use some funky characters in my Java code.

    Character c = new Character('🀀');

(If your web browser doesn't display the character, it's "1F000 🀀 MAHJONG TILE EAST WIND", as taken from here.

Java complains about 'invalid character constant'. What gives? I thought Java's Character supported Unicode.

Also, is there a way to initialize a Character by its Unicode value? Something like new Character('0x01F000')?

like image 203
Nicol Avatar asked Jan 21 '23 18:01

Nicol


1 Answers

Non-BMP (basic multilingual plane) characters can't be represented as a Java char (or thus a Character), because a char is only a 16-bit unsigned integer. Non-BMP characters are represented using surrogate pairs in Java.

You'll need to use a string... but even then I suspect you'll need to provide the surrogate pair of characters explicitly. C# has a \U escape sequence which is the equivalent of \u but for 32-bit values, but Java doesn't have anything like that :(

Here's an alternative approach which lets you use the Unicode value directly in your code:

String x = new String(new int[] { 0x1f000 }, 0, 1);

It's ugly, but it works...

like image 162
Jon Skeet Avatar answered Jan 23 '23 08:01

Jon Skeet