Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Unicode variable names

I got into an interesting discussion in a forum where we discussed the naming of variables.

Conventions aside, I noticed that it is legal for a variable to have the name of a Unicode character, for example the following is legal:

int \u1234;

However, if I for example gave it the name #, it produces an error. According to Sun's tutorial it is valid if "beginning with a letter, the dollar sign "$", or the underscore character "_"."

But the unicode 1234 is some Ethiopic character. So what is really defined as a "letter"?

like image 819
pg-robban Avatar asked Sep 14 '09 16:09

pg-robban


People also ask

What are valid variable names in Java?

A variable's name can be any legal identifier — an unlimited-length sequence of Unicode letters and digits, beginning with a letter, the dollar sign " $ ", or the underscore character " _ ". The convention, however, is to always begin your variable names with a letter, not " $ " or " _ ".

How do you name variables in Java?

For variables, the Java naming convention is to always start with a lowercase letter and then capitalize the first letter of every subsequent word. Variables in Java are not allowed to contain white space, so variables made from compound words are to be written with a lower camel case syntax.

Can you use unicode in Java?

Unicode sequences can be used everywhere in Java code. As long as it contains Unicode characters, it can be used as an identifier. You may use Unicode to convey comments, ids, character content, and string literals, as well as other information.

What is unicode value in Java?

Unicode is a computing industry standard designed to consistently and uniquely encode characters used in written languages throughout the world. The Unicode standard uses hexadecimal to express a character. For example, the value 0x0041 represents the Latin character A.


2 Answers

The Unicode standard defines what counts as a letter.

From the Java Language Specification, section 3.8:

Letters and digits may be drawn from the entire Unicode character set, which supports most writing scripts in use in the world today, including the large sets for Chinese, Japanese, and Korean. This allows programmers to use identifiers in their programs that are written in their native languages.

A "Java letter" is a character for which the method Character.isJavaIdentifierStart(int) returns true. A "Java letter-or-digit" is a character for which the method Character.isJavaIdentifierPart(int) returns true.

From the Character documenation for isJavaIdentifierPart:

Determines if the character (Unicode code point) may be part of a Java identifier as other than the first character. A character may be part of a Java identifier if any of the following are true:

  • it is a letter
  • it is a currency symbol (such as '$')
  • it is a connecting punctuation character (such as '_')
  • it is a digit
  • it is a numeric letter (such as a Roman numeral character)
  • it is a combining mark
  • it is a non-spacing mark
  • isIdentifierIgnorable(codePoint) returns true for the character
like image 69
Jon Skeet Avatar answered Oct 31 '22 02:10

Jon Skeet


Unicode characters fall into character classes. There's a set of Unicode characters which fall into the class "letter".

Determined by Character.isLetter(c) for Java. But for identifiers, Character.isJavaIdentifierStart(c) and Character.isJavaIdentifierPart(c) are more relevant.

For the relevant Unicode spec, see this.

like image 35
Vinay Sajip Avatar answered Oct 31 '22 01:10

Vinay Sajip