Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the compiler handle case statements?

In the documentation under case statements, it says:

Each value represented by a caseList must be unique in the case statement;

And the example shown,

case I of
  1..5: Caption := 'Low';
  6..9: Caption := 'High';
  0, 10..99: Caption := 'Out of range';
else
  Caption := ;
end

is equivalent to the nested conditional:

if I in [1..5] then
  Caption := 'Low';
else if I in [6..10] then
  Caption := 'High';
else if (I = 0) or (I in [10..99]) then
  Caption := 'Out of range'
else
  Caption := ;

So the first quote suggests it is handled like a set (read the comments here at least one person is with me on this).

Now I know the part

where selectorExpression is any expression of an ordinal type smaller than 32 bits

contradicts with the properties of sets as it is mentioned her under sets:

The base type can have no more than 256 possible values, and their ordinalities must fall between 0 and 255

What is really bugging me is that why it is a necessity to have a unique values in the caseList. If it is equivalent to the if statement, the second value would be just not tested because the compiler already found a prior match?

like image 453
Nasreddine Galfout Avatar asked Dec 03 '22 22:12

Nasreddine Galfout


2 Answers

case statement
The compiler prefers to transform the case statement to a jump table.
In order to make this possible duplicate case labels are not allowed.

Furthermore the compiler does not have to test the case statements in the order you've declared them. For optimization reasons it will reorder these elements as it sees fit; so even if it does not use a jump table, it still cannot allow duplicate case labels.

It is for these same reason (and to keep programmers mentally sane) that the case statement does not allow fall-throughs.

if statement
A (series of) if statements are processed in the order they are declared.
The compiler will test them one by one (and jump out when appropriate).
If the if-statement does not contain duplicates the code will perform the same as an equivalent case statement, but the generated code will most likely be different.
If statements are never converted into jump tables.

about sets
The reason sets are limited to 256 elements is that every element in a set takes up one bit of space.
256 bits = 32 bytes.
Allowing more than 256 elements in a set would grow the representation in memory too much, hampering performance.

The case statement does not use sets, it merely uses set logic in its semantics.

like image 29
Johan Avatar answered Jan 10 '23 18:01

Johan


The documentation takes a specific case statement which is equivalent to a specific if statement.

In general, any case statement can be rewritten as an if statement using the same approach. However, the inverse is not true.

The documentation uses an equivalent if statement to explain the logical behaviour (or semantics) of the case statement. It's not a representation of what the compiler does internally.

How does the compiler handle case statement?

First note there are 2 aspects to this question.

  • Semantically the compiler must handle the case statement as indicated in the documentation. This includes:
    • Ensuring the values of each caseList entry can be evaluated at compile-time.
    • Ensuring the caseList entries are unique.
    • Whichever caseList entry matches, the corresponding caseList-statement is called.
    • If no caseList entries match, the else statements are called.
  • However, the compiler is entitled to optimise the implementation as it sees fit provided the optimised byte/machine code is logically equivalent.
    • Johan's answer describes common optimisations: jump-lists and reordering.
    • These are much easier to apply given the strict semantics.

What is really bugging me is that why it is a necessity to have a unique values in the caseList.

The uniqueness is required to remove ambiguity. Which caseList-statement should be used if more than one matches?

  • It could call the first matching caseList-statement and ignore the rest. (SQL Server CASE statement behaves like this.) Also see [1] below.
  • It could call all of them. (If I remember correctly the MANTIS programming language uses this semantic for its version of a case statement.)
  • It could report an error requiring the programmer to disambiguate the caseList. (Simply put, this is what Delphi's specification requires. Many other languages use the same approach. Quibbling about it is unproductive especially since this choice is unlikely to be much of a hindrance.)

If it is equivalent to the if statement the second value would be just not tested because the compiler already found a prior match.

[1] I'd like to point out that this can make code much more difficult to read. This behaviour is "ok" when using magic literals, but when using const identifiers it becomes dangerous. If 2 different consts have the same value, it won't be immediately apparent that the latter caseList-statement where the caseList also matches won't be called. Also case statements would be subject to behaviour change due to a simple caseList re-ordering.

const
  NEW_CUSTOMER = 0;
  EDIT_CUSTOMER = 1;
  ...
  CANCEL_OPERATION = 0;

case UserAction of
  NEW_CUSTOMER : ...;
  EDIT_CUSTOMER : ...;
  ...
  CANCEL_OPERATION : ...; { Compiler error is very helpful. }
end;

Contradicts with the properties of sets

There's no contradiction. The fact that each caseList value must be unique does not in any way imply it must be "handled like a set". That's your incorrect assumption. Other's making the same assumption are likewise incorrect.

How the uniqueness constraint is checked, is up to the compiler. We can only speculate. But I'd guess the most efficient would be to maintain an ordered list of ranges. Work through each of the caseList of values and ranges, find its position in the above list. If it overlaps, then report an error, otherwise add it to the list.

like image 183
Disillusioned Avatar answered Jan 10 '23 20:01

Disillusioned