I got into an interesting discussion in a forum where we discussed the naming of variables.
Conventions aside, I noticed that it is legal for a variable to have the name of a Unicode character, for example the following is legal:
int \u1234;
However, if I for example gave it the name #, it produces an error. According to Sun's tutorial it is valid if "beginning with a letter, the dollar sign "$", or the underscore character "_"."
But the unicode 1234 is some Ethiopic character. So what is really defined as a "letter"?
A variable's name can be any legal identifier — an unlimited-length sequence of Unicode letters and digits, beginning with a letter, the dollar sign " $ ", or the underscore character " _ ". The convention, however, is to always begin your variable names with a letter, not " $ " or " _ ".
For variables, the Java naming convention is to always start with a lowercase letter and then capitalize the first letter of every subsequent word. Variables in Java are not allowed to contain white space, so variables made from compound words are to be written with a lower camel case syntax.
Unicode sequences can be used everywhere in Java code. As long as it contains Unicode characters, it can be used as an identifier. You may use Unicode to convey comments, ids, character content, and string literals, as well as other information.
Unicode is a computing industry standard designed to consistently and uniquely encode characters used in written languages throughout the world. The Unicode standard uses hexadecimal to express a character. For example, the value 0x0041 represents the Latin character A.
The Unicode standard defines what counts as a letter.
From the Java Language Specification, section 3.8:
Letters and digits may be drawn from the entire Unicode character set, which supports most writing scripts in use in the world today, including the large sets for Chinese, Japanese, and Korean. This allows programmers to use identifiers in their programs that are written in their native languages.
A "Java letter" is a character for which the method Character.isJavaIdentifierStart(int) returns true. A "Java letter-or-digit" is a character for which the method Character.isJavaIdentifierPart(int) returns true.
From the Character
documenation for isJavaIdentifierPart
:
Determines if the character (Unicode code point) may be part of a Java identifier as other than the first character. A character may be part of a Java identifier if any of the following are true:
- it is a letter
- it is a currency symbol (such as '$')
- it is a connecting punctuation character (such as '_')
- it is a digit
- it is a numeric letter (such as a Roman numeral character)
- it is a combining mark
- it is a non-spacing mark
- isIdentifierIgnorable(codePoint) returns true for the character
Unicode characters fall into character classes. There's a set of Unicode characters which fall into the class "letter".
Determined by Character.isLetter(c)
for Java. But for identifiers, Character.isJavaIdentifierStart(c)
and Character.isJavaIdentifierPart(c)
are more relevant.
For the relevant Unicode spec, see this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With