Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimization of "static" loops

I'm writing a compiled language for fun, and I've recently gotten on a kick for making my optimizing compiler very robust. I've figured out several ways to optimize some things, for instance, 2 + 2 is always 4, so we can do that math at compile time, if(false){ ... } can be removed entirely, etc, but now I've gotten to loops. After some research, I think that what I'm trying to do isn't exactly loop unrolling, but it is still an optimization technique. Let me explain.

Take the following code.

String s = "";
for(int i = 0; i < 5; i++){
    s += "x";
}
output(s);

As a human, I can sit here and tell you that this is 100% of the time going to be equivalent to

output("xxxxx");

So, in other words, this loop can be "compiled out" entirely. It's not loop unrolling, but what I'm calling "fully static", that is, there are no inputs that would change the behavior of the segment. My idea is that anything that is fully static can be resolved to a single value, anything that relies on input or makes conditional output of course can't be optimized further. So, from the machine's point of view, what do I need to consider? What makes a loop "fully static?"

I've come up with three types of loops that I need to figure out how to categorize. Loops that will always end up with the same machine state after every run, regardless of inputs, loops that WILL NEVER complete, and loops that I can't figure out one way or the other. In the case that I can't figure it out (it conditionally changes how many times it will run based on dynamic inputs), I'm not worried about optimizing. Loops that are infinite will be a compile error/warning unless specifically suppressed by the programmer, and loops that are the same every time should just skip directly to putting the machine in the proper state, without looping.

The main case of course to optimize is the static loop iterations, when all the function calls inside are also static. Determining if a loop has dynamic components is easy enough, and if it's not dynamic, I guess it has to be static. The thing I can't figure out is how to detect if it's going to be infinite or not. Does anyone have any thoughts on this? I know this is a subset of the halting problem, but I feel it's solvable; the halting problem is a problem due to the fact that for some subsets of programs, you just can't tell it may run forever, it may not, but I don't want to consider those cases, I just want to consider the cases where it WILL halt, or it WILL NOT halt, but first I have to distinguish between the three states.

like image 646
LadyCailin Avatar asked May 01 '12 03:05

LadyCailin


1 Answers

This looks like a kind of a symbolic solver that can be defined for several classes, but not generally.

Let's restrict the requirements a bit: no number overflow, just for loops (while can be sometimes transformed to full for loop, except when using continue etc.), no breaks, no modifications of the control variable inside the for loop.

for (var i = S; E(i); i = U(i)) ...

where E(i) and U(i) are expressions that can be symbolically manipulated. There are several classes that are relatively easy:

U(i) = i + CONSTANT : n-th cycle the value of i is S + n * CONSTANT

U(i) = i * CONSTANT : n-th cycle the value of i is S * CONSTANT^n

U(i) = i / CONSTANT : n-th cycle the value of i is S * CONSTANT^-n

U(i) = (i + CONSTANT) % M : n-th cycle the value of i is (S + n * CONSTANT) % M

and some other quite easy combinations (and some very difficult ones)

Determining whether the loop terminates is searching for n where E(i(n)) is false. This can be done by some symbolic manipulation for a lot of cases, but there is a lot of work involved in making the solver.

E.g.

  • for(int i = 0; i < 5; i++),
  • i(n) = 0 + n * 1 = n, E(i(n)) => not(n < 5) =>
  • n >= 5 => stops for n = 5

  • for(int i = 0; i < 5; i--),
  • i(n) = 0 + n * -1 = -n, E(i(n)) => not(-n < 5) => -n >= 5 =>
  • n < -5 - since n is a non-negative whole number this is never true - never stops

  • for(int i = 0; i < 5; i = (i + 1) % 3),
  • E(i(n)) => not(n % 3 < 5) => n % 3 >= 5 => this is never true => never stops

  • for(int i = 10; i + 10 < 500; i = i + 2 * i) =>
  • for(int i = 10; i < 480; i = 3 * i),
  • i(n) = 10 * 3^n,
  • E(i(n)) => not(10 * 3^n < 480) => 10 * 3^n >= 480 => 3^n >= 48 => n >= log3(48) => n >= 3.5... =>
  • since n is whole => it will stop for n = 4

for other cases it would be good if they can get transformed to the ones you can already solve...

Many tricks for symbolic manipulation come from Lisp era, and are not too difficult. Although the ones described (or variants) are the most common types practice, there are many more difficult and/or impossible to solve scenarios.

like image 121
jJ' Avatar answered Nov 19 '22 01:11

jJ'