Why does "Ꙭ".codePointAt(0)==205 and other Java Character bizarreness?

Question

(Lest this get closed as too localized, I chose Ꙭ as an example but this happens for many other characters also)

The character Ꙭ is \uA66C or decimal 42604 (http://unicodinator.com/#A66C). I'm seeing some very weird things I can't understand while using Java's Character class.

1) Character.isLetter('Ꙭ');//won't compile, complains 'unclosed character literal'
2) Character.isLetter("Ꙭ".charAt(0)); //returns true, which is right
3) Character.isLetter(42604);//returns false
4) Character.isLetter('\uA66C');//returns false
5) "Ꙭ".codePointAt(0);//returns 205? 205 is Í http://unicodinator.com/#00CD
6) ("Ꙭ".charAt(0)==(char)42604) //is false

Everything except #2 does not make sense to me. This character is in the BMP and is not from \uD800 to \uDFFF so there shouldn't be any complexity with surrogates. It seems like I'm missing some key concept here...

Tom Hawtin - tackline · Accepted Answer

It looks as if the character encoding your editor is using is different from that used by javac (or equivalent compiler). javac will default to picking up whichever encoding happens to be set as default on your machine. Use -encoding to change for javac.

Ꙭ in UTF-8 will appear in Latin 1 (or similar) as ê¬ (0xEA 0x99 0xAD), which isn't valid for a character literal as it is three characters.

As for 3 and 4, it apparently was introduced in the relatively new Unicode 5.1.0 (March 2008), which presumably isn't supported by the version of Java you are using. Apparently Java SE 6 uses Unicode 4.0; Java SE 7 uses Unicode 6.0.0.

Most people stick to US ASCII for source files, with good reason.

Why does "Ꙭ".codePointAt(0)==205 and other Java Character bizarreness?

Tags:

java

unicode

jwl

1 Answers

Tom Hawtin - tackline

Recent Activity

Donate For Us

Why does "Ꙭ".codePointAt(0)==205 and other Java Character bizarreness?

Tags:

java

unicode

jwl

1 Answers

Tom Hawtin - tackline

Related questions

Recent Activity

Donate For Us