Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When looking up constants mapped to small integers, is it faster to use a case statement or an constant array?

For example: I have the numbers 1 through 7 mapped to the days of the week. I can look them up with a seven-item case statement, or use a seven-item constant array. Which is faster?

Case example:

function GetDayNameBr(Num: Integer): String;
begin
  case Num of
    1: Result := 'Domingo';
    2: Result := 'Segunda';
    3: Result := 'Terça';
    4: Result := 'Quarta';
    5: Result := 'Quinta';
    6: Result := 'Sexta';
    7: Result := 'Sábado';
  end;       
end;

Constant array example:

function GetDayNameBr(Num: Integer): String;
const
  DayNames: array [1..7] of String = (
    'Domingo',
    'Segunda',
    'Terça',
    'Quarta',
    'Quinta',
    'Sexta',
    'Sábado');
begin
  Result := DayNames[Num];       
end; 
like image 914
Daniel Santos Avatar asked Jun 26 '13 14:06

Daniel Santos


3 Answers

The main reason for the different performance characteristics of these two functions are that they do different things. You are not comparing like with like. When the input value is in the range 1 to 7 inclusive, the behaviour is identical. However, when the input value is outside that range then the behaviour diverges.

The first version, that which uses case, must first check that the value is in the range 1 to 7. Only then is it allowed to actually assign to Result. If the value is in the range 1 to 7 then the compiler turns the case statement into an unconditional jmp statement which looks like this:

jmp dword ptr [eax*4+$40428f]

Here eax is the day index. And the target of these jump are instructions that simply assign string literals to the Result variable.

The second version, that which uses an array, does not check whether or not the input value is in range. It indexes into the array directly even if the input value is out of range, and of course such array index lead to undefined behaviour. So this is where the behaviour diverges.

Looking at this purely from performance, and ignoring the semantic differences in your functions, the main difference is that the version using case has a test and branch on the input value which is not present in the array version. In addition, the version using case has larger code and so is probably less cache friendly. So, from an analysis of the code we might expect the array version to be faster. It has less to do, it doesn't branch, the code is smaller.

If performance actually matters to you then you would need to perform some realistic timings in the actual setting in which this code runs. I cannot perform those timings because they would be artificial. Any times only have real meaning in the context of your code. It is very plausible indeed that in the setting of your program, you would not be able to measure the difference between the two versions. In which case the analysis above would be moot.

like image 138
David Heffernan Avatar answered Nov 09 '22 07:11

David Heffernan


Both are almost equally fast, at least under x86, i.e. with the 32 bit Delphi compiler.

The array will generate an indexed lookup, whereas the case will generate a jump instruction based on a lookup table, when targeting 32 bit. Array will be a little bit faster, but only slightly.

But AFAIR I discovered that the case instruction won't generate such a lookup table under x64, when targeting 64 bit. It generates a list of comparison and conditional jumps (something like if value=1 then ... else if value=2 then...) which is noticeably slower.

In your case, I would use an array lookup and an enumeration instead of plain integer values. It will compile as integers, but will be much easier to debug and make evolve. If the enumeration changes, the constant array won't compile any more, so you will be able to avoid some kind of issues at compile time, not at run time. I try to make exhaustive use of enumerations for such small lists, instead of integers. This is one straight of Delphi/pascal, which I miss very much in C# or Java.

type
  TDay = (dDomingo, dSegunda, dTerca, dQuarta, dSexta, dSabado);

function GetDayNameBr(Num: TDay): String; 
const
  DayNames: array [TDay] of String = (
    'Domingo',
    'Segunda',
    'Terça',
    'Quarta',
    'Quinta',
    'Sexta',
    'Sábado');
begin
  Result := DayNames[Num];       
end; 

or, even better IMHO, directly DayNames[Num] in the code, which will be both the safest and the fastest, on all platforms.

like image 9
Arnaud Bouchez Avatar answered Nov 09 '22 07:11

Arnaud Bouchez


Both are super fast, but I believe the array approach is unnoticably faster, if there is any difference at all.

However, I would definitely go for the array approach, simply because it separates the logic from the raw data. (Imagine you need to support two different languages -- compare how you'd do that in each case.) It is also more idiomatic.

like image 8
Andreas Rejbrand Avatar answered Nov 09 '22 06:11

Andreas Rejbrand