I understand that semicolons indicate the end of a line in languages like Java, but why? I get asked this a lot by other people, and I can't really think of a good way to explain how it works better than just using line breaks or white space.

They don't signal end of line, they signal end of statement. There are some languages that don't require them, but those languages don't allow multiple statements on a single line or a single statement to span multipile lines (without some other signal like VB's _ signal). Why do some languages allow multiple statements on a line? The philosophy is that whitespace is irrelevant (an end of line character is whitespace). This allows flexibility in how the code is formatted as formatting is not part of the semantic meaning.

First of all, the semicolon is a statement separator, not a line separator. Some languages use the new line character as statement separator, but languages which ignore all whitespace tend to use the semicolon. <h3>Why do languages ignore whitespace?</h3> A language ignores whitespace to allow the programmer to format the source code as he likes it. For example, in Java there is no difference between <pre class="prettyprint"><code>if (welcome) System.out.println("hello world"); </code></pre> and <pre class="prettyprint"><code>if (welcome) System.out.println("hello world"); </code></pre> This is not because there is one separate case for each of these in the grammar of the language, but because the whitespace is simply ignored. <h3>Why does a programming language need a statement separator?</h3> This is the core of the question. To understand it, let's consider a small language without any statement separator. It contains the following statement types: <pre class="prettyprint"><code>var x = foo() y[0, 1] = x bar() </code></pre> Here, <code>y</code> is a two-dimensional array and <code>x</code> is written to one of the entries of <code>y</code>. Now lets look at these statements like the compiler would see them: <pre class="prettyprint"><code>var x = foo() y[0, 1] = x bar() </code></pre> Because there is no statement separator, the compiler has to recognize the end of each statement by itself, to make sense of the input. Is the compiler able to do so? I guess in the above example the compiler can do it. Now, lets add another type of statement to out language: <pre class="prettyprint"><code>[x, y] = ["hello", "world"] </code></pre> The multi assignment allows the programmer to assign multiple values at once. After this line, the variable <code>x</code> will contain the value <code>"hello"</code> while the variable <code>y</code> contains <code>"world"</code>. This might be really handy to allow multiple return values from a function. Now how does this work together with the remaining statement types? Consider the following sequence of statements: <pre class="prettyprint"><code>foo() [x, y] = [1, 2] </code></pre> First, we call the method <code>foo</code>. Afterwards, we assign <code>1</code> to <code>x</code> and <code>2</code> to <code>y</code>. At least this is what we meant to do. Here is what the compiler sees: <pre class="prettyprint"><code>foo() [x, y] = [1, 2] </code></pre> Is the compiler able to recognize each statement? No. There are at least two possible interpretations. The first is the one we intended. Here is the second one: <pre class="prettyprint"><code>foo()[x, y] = [1, 2] </code></pre> What does this mean? First, we call the method <code>foo</code>. This method is supposed to return a two-dimensional array. Now, we write the array <code>[1, 2]</code> at the position <code>[x, y]</code> in the returned array. The compiler cannot recognize the statements, since there are at least two valid interpretations of the given input. Of course, this should never happen in a real programming language. In the given example, it might be easy to resolve, but the point is that it is hard to design a programming language without a statement separator to be not ambiguous. It is hard, because the language designer has to consider all possible permutations of statement types to be sure the language is not ambiguous. Thus, the statement separator helps the language designer to initially design the language, but more importantly it allows the language designer to easily extend the language in the future, for example by adding new statement types. This is a big thing, since once code is written in your language, you cannot simply change the grammar for existing statement types, because this will cause all the existing code to not compile anymore. <h3>TL;DR</h3> Summing it all up, the semicolon was introduced as statement separator in whitespace ignoring languages, because it is easier to design and extend a language which has a statement separator.

Why do some languages need semicolons?

2 Answers

They don't signal end of line, they signal end of statement.

There are some languages that don't require them, but those languages don't allow multiple statements on a single line or a single statement to span multipile lines (without some other signal like VB's _ signal).

Why do some languages allow multiple statements on a line? The philosophy is that whitespace is irrelevant (an end of line character is whitespace). This allows flexibility in how the code is formatted as formatting is not part of the semantic meaning.

answered Sep 24 '22 15:09

Tergiver

First of all, the semicolon is a statement separator, not a line separator. Some languages use the new line character as statement separator, but languages which ignore all whitespace tend to use the semicolon.

Why do languages ignore whitespace?

A language ignores whitespace to allow the programmer to format the source code as he likes it. For example, in Java there is no difference between

if (welcome)     System.out.println("hello world");

and

if (welcome) System.out.println("hello world");

This is not because there is one separate case for each of these in the grammar of the language, but because the whitespace is simply ignored.

Why does a programming language need a statement separator?

This is the core of the question. To understand it, let's consider a small language without any statement separator. It contains the following statement types:

var x = foo() y[0, 1] = x bar()

Here, y is a two-dimensional array and x is written to one of the entries of y.

Now lets look at these statements like the compiler would see them:

var x = foo() y[0, 1] = x bar()

Because there is no statement separator, the compiler has to recognize the end of each statement by itself, to make sense of the input. Is the compiler able to do so? I guess in the above example the compiler can do it.

Now, lets add another type of statement to out language:

[x, y] = ["hello", "world"]

The multi assignment allows the programmer to assign multiple values at once. After this line, the variable x will contain the value "hello" while the variable y contains "world". This might be really handy to allow multiple return values from a function. Now how does this work together with the remaining statement types?

Consider the following sequence of statements:

foo() [x, y] = [1, 2]

First, we call the method foo. Afterwards, we assign 1 to x and 2 to y. At least this is what we meant to do. Here is what the compiler sees:

foo() [x, y] = [1, 2]

Is the compiler able to recognize each statement? No. There are at least two possible interpretations. The first is the one we intended. Here is the second one:

foo()[x, y] = [1, 2]

What does this mean? First, we call the method foo. This method is supposed to return a two-dimensional array. Now, we write the array [1, 2] at the position [x, y] in the returned array.

The compiler cannot recognize the statements, since there are at least two valid interpretations of the given input. Of course, this should never happen in a real programming language. In the given example, it might be easy to resolve, but the point is that it is hard to design a programming language without a statement separator to be not ambiguous. It is hard, because the language designer has to consider all possible permutations of statement types to be sure the language is not ambiguous.

Thus, the statement separator helps the language designer to initially design the language, but more importantly it allows the language designer to easily extend the language in the future, for example by adding new statement types. This is a big thing, since once code is written in your language, you cannot simply change the grammar for existing statement types, because this will cause all the existing code to not compile anymore.

TL;DR

Summing it all up, the semicolon was introduced as statement separator in whitespace ignoring languages, because it is easier to design and extend a language which has a statement separator.

answered Sep 24 '22 15:09

Stefan Dollase

Related questions
                            
                                Microsoft.IdentityModel vs System.IdentityModel
                            
                                How can I add a Javaagent to a JVM without stopping the JVM?
                            
                                How can I extend a compiler generated copy constructor
                            
                                How to pass multiple params in batch?
                            
                                JQuery Mobile default data-theme
                            
                                In Python, is read() , or readlines() faster?
                            
                                PHP variable scope between code blocks
                            
                                Sort Perl array in place
                            
                                Usage of "aliased" in SQLAlchemy ORM
                            
                                Clear app badge with local notifications
                            
                                Reset buffer with BufferedReader in Java?
                            
                                String or binary data would be truncated.\r\nThe statement has been terminated. while xml insertion

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do some languages need semicolons?

Tags:

BlueThen

People also ask