I read from the Java doc of Character, that
The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP)
But I tried the following code, and found there is 2492 int is not defined! Is there any thing wrong? Or I have some misunderstanding? Thanks!
public static void main( String[] args )
{
int count=0;
for(int i = 0x0000; i<0xFFFF;i++)
{
if(!Character.isDefined(i))
{
count++;
}
}
System.out.println(count);
}
Output :
2492
To be clear, 0xffffffff is not -1, it is 4294967295. This value may or may not be representable by an int or an unsigned int (see 5.2.4.2.1p1 ). When the value cannot be represented by an int, converting it to an int has implementation-defined behaviour (see 6.3.1.3p3 ). C++まいる! Cをこわせ! To be clear, 0xffffffff is not -1, it is 4294967295...
The Unicode standard was initially designed using 16 bits to encode characters because the primary machines were 16-bit PCs. When the specification for the Java language was created, the Unicode standard was accepted and the char primitive was defined as a 16-bit data type, with characters in the hexadecimal range from 0x0000 to 0xFFFF.
Java was designed for using Unicode Transformed Format (UTF)-16, when the UTF-16 was designed. The ‘char’ data type in Java originally used for representing 16-bit Unicode. Therefore the size of the char data type in Java is 2 byte, and same for the C language is 1 byte. Hence Java uses Unicode standard.
FFFF FFFF is 1111 1111 and 'int a' has the first 1 as the flag for negetive number . . how its works. someone can figure it out for me ? Last edited by Idan Damri; 08-18-2014 at 07:13 AM . C++まいる! Cをこわせ! Simply put, 0xFF = 1111 1111 is the 2s complement for -1. Look up 2s complement. It's an encoding scheme used in computers.
The documentation for isDefined()
states that a character "is defined" if it has an entry or is in a range in the UnicodeData file. This identifies the set of code points that have been assigned to characters (and it might've been better named isAssigned()
). As you discovered, not all of the code points in the Basic Multilingual Plane have been assigned to characters yet (this map shows where some of the empty spaces are).
However, even if a code point has not been assigned (that is, isDefined()
is false
), it may be assigned in a future version of Unicode, and is still a valid code point. Encoding/decoding and working with unassigned code points is perfectly valid (although, it is a little strange).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With