Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How many characters are in Java

Tags:

java

character

How many unique characters exist in Java? I've looped to over 10,000, and characters are still being found:

for (int i = 0; i < 10000; i++)
    System.out.println((char) i);

Are there Integer.MAX characters? I always thought there was only 255 for some reason

like image 673
MCMastery Avatar asked Feb 08 '23 14:02

MCMastery


2 Answers

Java uses Unicode. Unicode code points are from U+0000 to U+10FFFF, which makes quite a lot.

But not all of them are defined. If you want to know how many of them are "supported", you can use that:

final long nrChars = IntStream.rangeClosed(0, 0x10ffff)
    .mapToObj(Character.UnicodeBlock::of)
    .filter(Objects::nonNull)
    .count();

Also note that due to historical reasons, Java's char can only represent directly code points up to U+FFFF. For the "rest" (which is now pretty much the majority of defined code points), Java uses a surrogate pair. See Character.toChars().

like image 74
fge Avatar answered Feb 13 '23 22:02

fge


Java was designed to use internally Unicode, so diverse scripts could be combined in one String. Unicode is a numbering of all scripts going into the 3 byte range. Such Unicode "code points" are represented as int in java.

At that time char and String were for text, char using UTF-16 (an Unicode representation using 16 bits, sometime with two chars for a Unicode code point. (However String constants in a .class file are in UTF-8.)

char hence takes 2 bytes. byte takes 1 byte and byte[] is for binary data.

In earlier languages (C, C++) there was often no such distinction between char and byte.

like image 40
Joop Eggen Avatar answered Feb 13 '23 23:02

Joop Eggen