I understand that semicolons indicate the end of a line in languages like Java, but why?
I get asked this a lot by other people, and I can't really think of a good way to explain how it works better than just using line breaks or white space.
Slightly aged languages (some would call them business standards), such as C/C++, Java, C#, require you to put a semicolon at the end of every statement. Very old languages, such as COBOL or ABAP, even require something different, such as a dot.
Python is supposed to be clean and readable. Syntactic characters like semi-colons add unnecessary clutter. If you send a code like this to an experienced Python programmer, you will never hear the end of it. Forcing multiple statements onto one line makes a trivial code harder to read.
The science of semicolons is very simple: The syntax of C# is such that semi-colons are necessary to avoid ambiguity. Python had the luxury of designing syntax to avoid explicit statement terminators from the very start - and they chose to use semantic whitespace instead.
They don't signal end of line, they signal end of statement.
There are some languages that don't require them, but those languages don't allow multiple statements on a single line or a single statement to span multipile lines (without some other signal like VB's _ signal).
Why do some languages allow multiple statements on a line? The philosophy is that whitespace is irrelevant (an end of line character is whitespace). This allows flexibility in how the code is formatted as formatting is not part of the semantic meaning.
First of all, the semicolon is a statement separator, not a line separator. Some languages use the new line character as statement separator, but languages which ignore all whitespace tend to use the semicolon.
A language ignores whitespace to allow the programmer to format the source code as he likes it. For example, in Java there is no difference between
if (welcome) System.out.println("hello world");
and
if (welcome) System.out.println("hello world");
This is not because there is one separate case for each of these in the grammar of the language, but because the whitespace is simply ignored.
This is the core of the question. To understand it, let's consider a small language without any statement separator. It contains the following statement types:
var x = foo() y[0, 1] = x bar()
Here, y
is a two-dimensional array and x
is written to one of the entries of y
.
Now lets look at these statements like the compiler would see them:
var x = foo() y[0, 1] = x bar()
Because there is no statement separator, the compiler has to recognize the end of each statement by itself, to make sense of the input. Is the compiler able to do so? I guess in the above example the compiler can do it.
Now, lets add another type of statement to out language:
[x, y] = ["hello", "world"]
The multi assignment allows the programmer to assign multiple values at once. After this line, the variable x
will contain the value "hello"
while the variable y
contains "world"
. This might be really handy to allow multiple return values from a function. Now how does this work together with the remaining statement types?
Consider the following sequence of statements:
foo() [x, y] = [1, 2]
First, we call the method foo
. Afterwards, we assign 1
to x
and 2
to y
. At least this is what we meant to do. Here is what the compiler sees:
foo() [x, y] = [1, 2]
Is the compiler able to recognize each statement? No. There are at least two possible interpretations. The first is the one we intended. Here is the second one:
foo()[x, y] = [1, 2]
What does this mean? First, we call the method foo
. This method is supposed to return a two-dimensional array. Now, we write the array [1, 2]
at the position [x, y]
in the returned array.
The compiler cannot recognize the statements, since there are at least two valid interpretations of the given input. Of course, this should never happen in a real programming language. In the given example, it might be easy to resolve, but the point is that it is hard to design a programming language without a statement separator to be not ambiguous. It is hard, because the language designer has to consider all possible permutations of statement types to be sure the language is not ambiguous.
Thus, the statement separator helps the language designer to initially design the language, but more importantly it allows the language designer to easily extend the language in the future, for example by adding new statement types. This is a big thing, since once code is written in your language, you cannot simply change the grammar for existing statement types, because this will cause all the existing code to not compile anymore.
Summing it all up, the semicolon was introduced as statement separator in whitespace ignoring languages, because it is easier to design and extend a language which has a statement separator.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With