Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java expression interpretation rules of decrement/increment operators

This is a purely theoretical question, I wouldn't write this code normally, for clarity's sake.

Why is this quite ambiguous statement legal

int a = 1, b = 2;
int c = a---b; // a=0, b=2, c=-1

(it is interpreted as a-- -b)

and this one isn't?

int c = a-----b;

The first statement could also be interpreted as a- --b, while the second statement clearly has only 1 logical interpretation which would be a-- - --b.

Also another curious one:

int c = a--- -b; // a=0, b=2, c=3

(and int c = a----b; isn't a legal statement)

How is the expression interpretation defined in Java? I tried searching JLS, but haven't found an answer for this.

like image 940
radoh Avatar asked Mar 17 '16 12:03

radoh


People also ask

How will you represent increment and decrement operators in Java?

Java has two very useful operators. They are increment (++) and decrement (- -) operators. The increment operator (++) add 1 to the operator value contained in the variable. The decrement operator (- -) subtract from the value contained in the variable.

What does ++ i and i ++ mean in Java?

Increment in java is performed in two ways, 1) Post-Increment (i++): we use i++ in our statement if we want to use the current value, and then we want to increment the value of i by 1. 2) Pre-Increment(++i): We use ++i in our statement if we want to increment the value of i by 1 and then use it in our statement.


1 Answers

Introduction

To understand this correctly, one needs to realize that all modern compilers have two levels of recognizing the source language, the lexical level and the syntactical level.

The lexical level (the "lexer") splits the source code into tokens: literals (string/numeric/char), operators, identifiers, and other elements of the lexical grammar. These are the "words" and "punctuation characters" of the programming language.

The syntactical level (the "parser") is concerned with interpreting these low-level lexicals tokens into syntax, usually represented by syntax trees.

The lexer is the level that needs to know if a token is a "minus" token (-) or an "decrement" (--) token. (Whether the minus token is a unary or a binary minus, or whether the decrement token is a post or pre decrement token is determined at the syntactical level)

Things like precedence and left-to-right versus right-to-left only exist at the syntactical level. But whether a---b is a -- - b or a - -- b is determined at the lexical level.

Answer

Why a---b becomes a -- - b is described in the Java Language Specification section 3.2 "Lexical Translations":

The longest possible translation is used at each step, even if the result does not ultimately make a correct program while another lexical translation would.

So the longest possible lexical token is formed.

In the case of a---b, it makes the tokens a, -- (longest) then the only possible next token -, then b.

In the case of a-----b, it would be translated into a, --, --, -, b, which is not grammatically valid.

To quote a bit further:

There are 3 steps in the lexical translation process, and in this case, the above applies to step 3 in this case:

A raw Unicode character stream is translated into a sequence of tokens, using the following three lexical translation steps, which are applied in turn:

  1. A translation of Unicode escapes (§3.3) in the raw stream of Unicode characters to the corresponding Unicode character. A Unicode escape of the form \uxxxx, where xxxx is a hexadecimal value, represents the UTF-16 code unit whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.

  2. A translation of the Unicode stream resulting from step 1 into a stream of input characters and line terminators (§3.4).

  3. A translation of the stream of input characters and line terminators resulting from step 2 into a sequence of input elements (§3.5) which, after white space (§3.6) and comments (§3.7) are discarded, comprise the tokens (§3.5) that are the terminal symbols of the syntactic grammar (§2.3).

("input elements" are "tokens")

like image 147
Erwin Bolwidt Avatar answered Sep 21 '22 10:09

Erwin Bolwidt