Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java is said to ignore extra whitespace. Why does c=a++ + ++b not compile without the spaces?

In all books on Java, I've read that the compiler treats all whitespace in the same way and simply ignores extra whitespace, so it's best practice to use them liberally to improve code readability. I've found proof to that in every expression that I've written: It didn't matter whether there were spaces or not, and how many (or maybe I just didn't pay attention).

Recently I decided to experiment a little with operator precedence and associativity to test the precedence table in action and tried to compile

int a = 2;
int b = 3;    
int c = a+++b;
int d = a+++++b;

While the former statement compiled perfectly, the latter produced an exception:

Exception in thread "main" java.lang.RuntimeException: Uncompilable source code - unexpected type. Required: variable. Found: value.

However, when I added spaces: int d = a++ + ++b, it compiled. Why is this the case? Java is said to ignore extra whitespace anyway. (I have Java 8 and Netbeans IDE 8.2, if this matters.)

I guess this might have something to do with how expressions are parsed, but I'm not sure. I tried looking up several questions on parsing, whitespace, and operators on SO and on Google but couldn't find a definitive answer.

UPD. To address the comments that it's the 'extra' that matters, not all whitespace: since int c = a++ + b; and int c=a+++b; both compile, one could say, by analogy, that in int d = a ++ + ++b; whitespace is 'extra' as well.

like image 490
John Allison Avatar asked Dec 05 '22 10:12

John Allison


2 Answers

Java Language Specification section 3.2, "Lexical Translations", says (emphasis mine):

A raw Unicode character stream is translated into a sequence of tokens, using the following three lexical translation steps, which are applied in turn:

  1. A translation of Unicode escapes [...]

  2. A translation [...] into a stream of input characters and line terminators [...].

  3. A translation of the stream of input characters and line terminators resulting from step 2 into a sequence of input elements (§3.5) which, after white space (§3.6) and comments (§3.7) are discarded, comprise the tokens (§3.5) that are the terminal symbols of the syntactic grammar (§2.3).

The longest possible translation is used at each step, even if the result does not ultimately make a correct program while another lexical translation would.

So white space characters are discarded, but after the "sequence of input elements" is determined. Section 3.5, "Input Elements and Tokens", says:

White space (§3.6) and comments (§3.7) can serve to separate tokens that, if adjacent, might be tokenized in another manner. For example, the ASCII characters - and = in the input can form the operator token -= (§3.12) only if there is no intervening white space or comment.

like image 60
Daniel Pryden Avatar answered Dec 07 '22 23:12

Daniel Pryden


The syntax analyzer need to understand what you are writing.

The sequence of plus symbols is not understable from the syntax analyzer point of view if you don't put a separator between them.

Adding more spaces over the minimum doesn't change the result.

So both lines have the same result:

int d = a++ + ++b;
int d = a++     +     ++b;

Consider instead the following code:

int d = a +++ b;

What is your intent?

int d = a + ++b;

or

int d = a++ + b;

Also from a human point of view is not possible to understand without the extra white spaces.

Also if this code works for a compiler is not understable from a human point of view.

The sequence a++++++b without spaces is not understable from a compiler point of view because he is trying to read as much as possible characters to determine the token resulting in the sequence a ++ ++ + b that is not a valid sequence of token.


In any case my suggestion is keep your code as clean as possible from a human point of view so it will be more easier to be mantained, read, enhanced. So use spaces when needed, don't abuse them, but don't remove them if the resulting code is less readable.

like image 44
Davide Lorenzo MARINO Avatar answered Dec 07 '22 22:12

Davide Lorenzo MARINO