Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are there local variables in stack-based IL bytecode

In a stack-based intermediate language, such as CIL or Java bytecode, why are there local variables? One could just use only the stack. May not be so easy for hand-crafted IL, but a compiler can surely do it. But my C# compiler does not.

Both the stack and the local variables are private to the method and go out of scope when the method returns. So it could not have anything to do with side-effects visible from outside the method (from another thread).

A JIT compiler would eliminate loads and stores to both stack slots and local variables when generating machine code, if I am correct, so the JIT compiler also does not see the need for local variables.

On the other hand, the C# compiler generates loads and stores for local variables, even when compiling with optimizations enabled. Why?


Take for example, the following contrived example code:

static int X()
{
    int a = 3;
    int b = 5;
    int c = a + b;
    int d;
    if (c > 5)
        d = 13;
    else
        d = 14;
    c += d;
    return c;
}

When compiled in C#, with optimizations, it produces:

    ldc.i4.3        # Load constant int 3
    stloc.0         # Store in local var 0
    ldc.i4.5        # Load constant int 5
    stloc.1         # Store in local var 1
    ldloc.0         # Load from local var 0
    ldloc.1         # Load from local var 1
    add             # Add
    stloc.2         # Store in local var 2
    ldloc.2         # Load from local var 2
    ldc.i4.5        # Load constant int 5
    ble.s label1    # If less than, goto label1
    ldc.i4.s 13     # Load constant int 13
    stloc.3         # Store in local var 3
    br.s label2     # Goto label2
label1:
    ldc.i4.s 14     # Load constant int 14
    stloc.3         # Store in local var 3
label2:
    ldloc.2         # Load from local var 2
    ldloc.3         # Load from local var 3
    add             # Add
    stloc.2         # Store in local var 2
    ldloc.2         # Load from local var 2
    ret             # Return the value

Note the loads and stores to the four local variables. I could write the exact same operations (disregarding the obvious constant propagation optimization) without using any local variables.

    ldc.i4.3        # Load constant int 3
    ldc.i4.5        # Load constant int 5
    add             # Add
    dup             # Duplicate top stack element
    ldc.i4.5        # Load constant int 5
    ble.s label1    # If less than, goto label1
    ldc.i4.s 13     # Load constant int 13
    br.s label2     # Goto label2
label1:
    ldc.i4.s 14     # Load constant int 14
label2:
    add             # Add
    ret             # Return the value

It seems correct to me, and a lot shorter and more efficient. So, why do stack-based intermediate languages have local variables? And why does the optimizing compiler use them so extensively?

like image 802
Daniel A.A. Pelsmaeker Avatar asked Sep 16 '12 23:09

Daniel A.A. Pelsmaeker


1 Answers

This answer is purely speculative -- but I suspect that the answer has 3 parts.

1: The code transforms to prefer Dup over local variables is very non-trivial, even when you ignore side effects. It adds a lot of complexity and potentially a lot of execution time to the optimization.

2: You can't ignore side effects. In the example where everything is just a literal, it is very easy to know that the values are in stack or locals, and are therefore under complete control of the current instructions. Once those values come from the heap, static memory, or method calls, you can no longer shuffle things around to use Dup instead of locals. Changing the order may change how things actually work, and cause unintended consequences due to side effects or external access to shared memory. This means that usually, you can't make these optimizations.

3: The assumption that stack values are faster that local variables is not a good assumption -- it may be true for a particular IL->machine code transform that stack values are faster, but there is no reason why a smart JIT would not put a stack location into memory and a local variable into a register. It is the JIT's job to know what is fast and what is slow for the current machine, and it is the JIT's job to solve the problem. By design, the CIL compiler does not have an answer to whether locals or stack are faster; and so the measurable difference between these results is only in code size.

Put together, 1 means that it is hard and has a non-trivial cost, 2 means that real world cases where it would be valuable are few, and 3 means that 1 and 2 are irrelevant anyways.

Even if the goal is to minimize CIL size, which IS a measurable goal for the CIL compiler, reason #2 describes this as a small improvement to a small amount of cases. The Pareto principle can't tell us that it is a BAD idea to implement this kind of optimization, but it will recommend that there are probably BETTER uses of developer time.

like image 187
danwyand Avatar answered Sep 18 '22 07:09

danwyand