Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Valid identifier characters in Scala

One thing I find quite confusing is knowing which characters and combinations I can use in method and variable names. For instance

val #^ = 1 // legal val #  = 1 // illegal val +  = 1 // legal val &+ = 1 // legal val &2 = 1 // illegal val £2 = 1 // legal val ¬  = 1 // legal 

As I understand it, there is a distinction between alphanumeric identifiers and operator identifiers. You can mix an match one or the other but not both, unless separated by an underscore (a mixed identifier).

From Programming in Scala section 6.10,

An operator identifier consists of one or more operator characters. Operator characters are printable ASCII characters such as +, :, ?, ~ or #.

More precisely, an operator character belongs to the Unicode set of mathematical symbols(Sm) or other symbols(So), or to the 7-bit ASCII characters that are not letters, digits, parentheses, square brackets, curly braces, single or double quote, or an underscore, period, semi-colon, comma, or back tick character.

So we are excluded from using ()[]{}'"_.;, and `

I looked up Unicode mathematical symbols on Wikipedia, but the ones I found didn't include +, :, ? etc. Is there a definitive list somewhere of what the operator characters are?

Also, any ideas why Unicode mathematical operators (rather than symbols) do not count as operators?

like image 318
Luigi Plinge Avatar asked Oct 05 '11 05:10

Luigi Plinge


People also ask

What are identifiers in Scala?

In Scala, an identifier can be a class name, method name, variable name or an object name.

What characters are allowed in an identifier?

Only alphabetic characters, numeric digits, and the underscore character (_) are legal in an identifier. The first character of an identifier must be alphabetic or an underscore (it cannot be a numeric digit).

Which identifier is a valid?

A valid identifier must have characters [A-Z] or [a-z] or numbers [0-9], and underscore(_) or a dollar sign ($). for example, @javatpoint is not a valid identifier because it contains a special character which is @. There should not be any space in an identifier. For example, java tpoint is an invalid identifier.

Which length is valid for identifier?

Identifiers can be a combination of letters in lowercase (a to z) or uppercase (A to Z) or digits (0 to 9) or an underscore _ .


1 Answers

Working from the EBNF syntax in the spec:

upper ::= ‘A’ | ... | ‘Z’ | ‘$’ | ‘_’ and Unicode category Lu lower ::= ‘a’ | ... | ‘z’ and Unicode category Ll letter ::= upper | lower and Unicode categories Lo, Lt, Nl digit ::= ‘0’ | ... | ‘9’ opchar ::= “all other characters in \u0020-007F and Unicode             categories Sm, So except parentheses ([]) and periods” 

But also taking into account the very beginning on Lexical Syntax that defines:

Parentheses ‘(’ | ‘)’ | ‘[’ | ‘]’ | ‘{’ | ‘}’. Delimiter characters ‘‘’ | ‘’’ | ‘"’ | ‘.’ | ‘;’ | ‘,’ 

Here is what I come up with. Working by elimination in the range \u0020-007F, eliminating letters, digits, parentheses and delimiters, we have for opchar... (drumroll):

! # % & * + - / : < = > ? @ \ ^ | ~ and also Sm and So - except for parentheses and periods.

(Edit: adding valid examples here:). In summary, here are some valid examples that highlights all cases - watch out for \ in the REPL, I had to escape as \\:

val !#%&*+-/:<=>?@\^|~ = 1 // all simple opchars val simpleName = 1  val withDigitsAndUnderscores_ab_12_ab12 = 1  val wordEndingInOpChars_!#%&*+-/:<=>?@\^|~ = 1 val !^©® = 1 // opchars ans symbols val abcαβγ_!^©® = 1 // mixing unicode letters and symbols 

Note 1:

I found this Unicode category index to figure out Lu, Ll, Lo, Lt, Nl:

  • Lu (uppercase letters)
  • Ll (lowercase letters)
  • Lo (other letters)
  • Lt (titlecase)
  • Nl (letter numbers like roman numerals)
  • Sm (symbol math)
  • So (symbol other)

Note 2:

val #^ = 1 // legal   - two opchars val #  = 1 // illegal - reserved word like class or => or @ val +  = 1 // legal   - opchar val &+ = 1 // legal   - two opchars val &2 = 1 // illegal - opchar and letter do not mix arbitrarily val £2 = 1 // working - £ is part of Sc (Symbol currency) - undefined by spec val ¬  = 1 // legal   - part of Sm 

Note 3:

Other operator-looking things that are reserved words: _ : = => <- <: <% >: # @ and also \u21D2 ⇒ and \u2190

like image 90
huynhjl Avatar answered Sep 17 '22 20:09

huynhjl