Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Untying Knuth's knots: how to restructure spaghetti code?

This question was inspired by How to transform a flow chart into an implementation? which asks about ways of algorithmically eliminating the goto statement from code. The answer to the general problem is described in this scientific paper.

I have implemented some code following the high-level sketch of Algorithm X from Knuth's The art of computer programming describing the generation of Lexicographic permutations with restricted prefixes (see p. 16 of this draft).

This is the corresponding flow chart of the above algorithm.

This might be a very clever and very efficient algorithm, but the structure of the code seems to be challenging to follow. I ended up using the good old goto-style implementation:

//Algorithm X;
1:
initialize();
2:
enter_level(k);
3:
set(a[k],q);
if(test() == ok) {
  if (k == n) {
    visit();
    goto 6;
  }
  goto 4;
}
goto 5;
4:
increase(k);
goto 2;
5:
increasev2(a[k]);
if (q != 0) {
  goto 3;
}
6:
decrease(k);
if (k==0) {
  goto 7;
}
set(p,u_k);
goto 5;
7:
return;

The question is: how can this code be refactored to eliminate all the goto calls?

One (bogus) answer is a suggestion to "look up the cited scientific paper, and follow it line by line" - indeed, that is certainly a possibility. But this question is about what experienced programmers see instantly once they give a glance at this spaghetti code.

I am interested in the how of refactoring, step by step, more than just the code.


Note:

  1. It is straightforward to actually implement Algorithm X based on its high-level specification and goto jumps. Implementing the black-box functions initialize() etc. would take only a few extra instructions, but those are irrelevant regarding the structure of the code. It is not important what is happening during the function calls, as the focus is now on the flow of the program.
  2. The usual debate of "is GOTO still considered harmful?" is absolutely irrelevant regarding this question, and should not be addressed at all in the answers and in the comments.

Related: how to work with or complete the spaghetti code?

like image 937
Matsmath Avatar asked May 06 '16 18:05

Matsmath


5 Answers

Without too much effort (and not a lot of risk) you can make some quick reductions in the amount of gotos and labels.

1) remove labels not referenced anywhere (this would be label 1:)

2) look for blocks of code that can't be entered except by goto, that are called in few places. These can often be factored out simply. 4: can be dealt with by moving the code to where it's called, and done safely since its only exit is a goto. This also allows us to remove the goto 5 above it, since that code will simply fall through to 5:. 7: can be handled by modifying the if statement. At this point we have

initialize();
2:
enter_level(k);
3:
set(a[k],q);
if(test() == ok) {
  if (k == n) {
    visit();
    goto 6;
  }
  increase(k);
  goto 2;
}
5:
increasev2(a[k]);
if (q != 0) {
  goto 3;
}
6:
decrease(k);
if (k!=0) {
  set(p,u_k);
  goto 5;
}
return;

I'd be inclined to stop here. But if you continue, it becomes a matter of identifying loops and replacing gotos with loop constructs. However, because of the way that code is structured, the risk in making those changes seems much greater. Also, you likely end up with breaks and continues which are gotos of a sort anyway. What I ended up with is this (and I wouldn't vouch for its correctness without some very rigorous testing):

initialize();
enter_level(k);
while (true) {
  set(a[k],q);
  if(test() == ok) {
    if (k == n) {
      visit();
    } else {
      increase(k);
      enter_level(k);
      continue;
    }
  } else {
    increasev2(a[k]);
    if (q != 0) {
      continue; 
    }
  }
  while (true) {
    decrease(k);
    if (k!=0) {
      set(p,u_k);
      increasev2(a[k]);
      if (q != 0) {
        break; 
      }
    } else {
      return;
    }
  }
}

I made 3: a loop, and 6: an inner loop. I got rid of the goto 5 by copying the 5: code in place of the goto, and replacing the goto 3 with a break. That makes it a little easier to make cleaner loops. The goto 6 is fixed through the use of an else. The goto 3's become continues.

After this (if you have the energy left) you can try to change the loops from while(true) with continues into whiles with actual conditions.

It's a good idea to develop your tests first, and then make one or two changes and test. Make another change, then test again. If you don't do that, it's easy to make a structural mistake early on that then invalidates the subsequent steps and forces you to start all over again.

like image 126
hatchet - done with SOverflow Avatar answered Oct 06 '22 03:10

hatchet - done with SOverflow


I sketched an algorithm for OP earlier at https://stackoverflow.com/a/36661381/120163

Found a better paper that discusses how to generate structured code while preserving the original control flow graph exactly:

W.D Maurer, "Generalized structured programs and loop trees", Science of Computer Programming, 2007

I followed that procedure (on paper, hope I did it right, looks OK at 2:40 am). His basic trick is to find strongly connected regions (cycles in the code); these will become loops; He then breaks that cycle by deleting an edge; this eventually becomes a loop backlink (restored when he is complete). The process is repeated until no more cycles can be found; what is left is essentially a structured program with identified loops. It is tricky to do this right; you really need an automated procedure. Your bit of code, although small, is still pretty nasty :-}

I cheated in one place. Maurer insists that forward gotos are OK, even into the middle of loop. If you buy that, then you can preserve the CFG exactly. If not, you have to handle the case where a loop has two or more entry points; your algorithm has such a loop. I solved the problem by coding the loop, and coding a loop-tail-end-fragment equivalent that acts like the first iteration of jump-into-the-middle, which is followed by the loop itself.

My notation is a little funny: most languages don't have "block{...}" constructs. [The one I code in (see bio) does]. Think of this as being an "execute one iteration" loop :-} I assume that blocks/loops have loop exits and loop continues. If you dont have these, you can simulate them with sufficient quantity of just block{ ... } and exit_block@N.

EDIT after accepted: In the light of day, I didn't do it right, I left out the of the while loop@3. I've patched that; the need for the block construct now goes away because I can exit the while loop@3 to achieve the same affect. Actually, the code reads a little better.

I left your numeric labels in, even where they aren't needed, for easier reference.

//Algorithm X;
1:
initialize();
2:
while (true) {
   enter_level(k);
   3: 
   while (true) {
      set(a[k],q);
      if (test() == ok) {
         if (k != n) exit_while@3;
         visit();
         decrease(k); // replicate logic at 6 to avoid jumping into middle of 5 loop
         if (k==0) return;
         set(p,u_k);
      }
      5:
      while (true) {
         increasev2(a[k]);
         if (q != 0) continue_while@3;
         6:
         decrease(k);
         if (k==0) return;
         set(p,u_k);
      } // while(true)@5
  } // while(true)@3
  4:
  increase(k);
} // while(true)@2

Unlike most other answers I've seen so far, this runs at the same speed as the original (no extra flags or flag checks).

@hatchet's answer is interesting; a) it is equally fast, b) he chose to handle the two entry loop by the same technique, but he chose the "other entry" as the loop top. He did something similar with the "enter_level(k)" operation at label 2.

It is interesting that all this structuring doesn't seem to help readability of this code one bit. Makes one wonder about the whole point of "structured programs". Maybe well-designed spaghetti isn't so bad :-}

like image 40
Ira Baxter Avatar answered Oct 06 '22 04:10

Ira Baxter


In c++ the algorithm could be written as:

void initialize() {}
void enter_level(int k) {}
void set(int x,int y) {}
bool test() { return true; }
void visit() {}
void increase(int k) {}
void increasev2(int k) {}
void decrease(int k) {}

void algorithm_x()
{
    int k{0};
    int a[] ={1,2,3,4,5};
    int q{0};
    bool ok{true};
    int n{0};
    int p{0};
    int u_k{0};

        //Algorithm X;
    lbl1:
        initialize();
    lbl2:
        enter_level(k);
    lbl3:
        set(a[k],q);
        if (test() == ok) {
            if (k == n) {
                visit();
                goto lbl6;
            }
            goto lbl4;
        }
        goto lbl5;
    lbl4:
        increase(k);
        goto lbl2;
    lbl5:
        increasev2(a[k]);
        if (q != 0) {
            goto lbl3;
        }
    lbl6:
        decrease(k);
        if (k==0) {
            goto lbl7;
        }
        set(p,u_k);
        goto lbl5;
    lbl7:
        return;

}

int main()
{
    algorithm_x();
    return 0;
}

Assuming we don't use break statements the program could then be:

void initialize() {}
void enter_level(int k) {}
void set(int x,int y) {}
bool test() { return true; }
void visit() {}
void increase(int k) {}
void increasev2(int k) {}
void decrease(int k) {}

void algorithm_x()
{
    int k{0};
    int a[] ={1,2,3,4,5};
    int q{0};
    bool ok{true};
    int n{0};
    int p{0};
    int u_k{0};

    bool skiptail{false};

    //Algorithm X;
    initialize();
    enter_level(k);
    while (true) {

        skiptail = false;
        set(a[k],q);
        if (test() == ok) {
            if (k == n) {
                visit();
                decrease(k);
                if (k==0) {
                    return;
                }
                set(p,u_k);
                while (true) {
                    increasev2(a[k]);
                    if (q != 0) {
                        //goto lbl3;
                        skiptail = true;
                    }
                    if (!skiptail) decrease(k);
                    if (!skiptail) if (k==0) {
                        return;
                    }
                    if (!skiptail) set(p,u_k);
                }
            }
            if (!skiptail) increase(k);
            if (!skiptail) enter_level(k);
            //goto lbl3;
            skiptail = true;
        }
        if (!skiptail) while (true) {
            increasev2(a[k]);
            if (q != 0) {
                //goto lbl3;
                skiptail = true;
            }
            if (!skiptail) decrease(k);
            if (!skiptail) if (k==0) {
                return;
            }
            if (!skiptail) set(p,u_k);
        }
        if (!skiptail) increase(k);
        if (!skiptail) enter_level(k);
        //goto lbl3;
        skiptail = true;
        if (!skiptail) while (true) {
            increasev2(a[k]);
            if (q != 0) {
                //goto lbl3;
                skiptail = true;
            }
            if (!skiptail) decrease(k);
            if (!skiptail) if (k==0) {
                return;
            }
            if (!skiptail) set(p,u_k);
        }
    }

}

int main()
{
    algorithm_x();
    return 0;
}

The change used the following algorithm:

  1. Get rid of unused labels. Remove lbl1

  2. If a label ends with a goto, then replace that block wherever it is used. Remove lbl4, lbl6, lbl7

  3. if a label returns to itself then place block in while (true). Remove bottom lbl5 (lbl5 is now self contained and can be replaced where used)

  4. if a block is self contained, then replace wherever it is used. Remove lbl5

  5. If one label follows another then place a goto next label at the end of the block so that it can be replaced as per rule 2. Remove lbl2 (which can goto lbl3)

  6. now we're left with the last label's goto's throughout the code. Replace goto lbl3 with skiptail=true,place the remaining block in a while (true) block, and set remaining statements to check if skiptail=false. Remove lbl3 and replace with skiptail = false.

like image 21
wally Avatar answered Oct 06 '22 04:10

wally


I've never used goto, but this seemed like a fun challenge, so I tried my own hand at refactoring.

First of all, go through the code and see how many statements are gotoing each label; this will be important to keep in mind to avoid mistakes. In your example, nothing is leading to 1, so we can ignore it.

Sometimes, I found it useful to add gotos when they were implied by the control flow. It helped to keep track of the order when I moved code between things.

The best way to refactor gotos is working upwards from the inside or bottom up.

  • The last instruction is 7:return;, which can simply be moved to wherever goto 7 is called. That's easy.

  • Next, I try to see which labels end with a goto (unconditionally) and come directly after a different goto. That would be 4 in this case; it can be moved in front of 2, inside an if controlled by a sentinel (in preparation for a loop). (goto my first point to see that 2 can be removed now.)

  • The next thing I did was put 5 and 6 into a loop. If I was wrong, I could just backtrack anyway.

  • At this point, I see that 6 will be executed after either 3 or 5. I also see that 5 can execute 3, so I decide to move 3 after 5. I add a variable so that I can skip 5 the first time. I set it to true at the end of 6.

  • In order to ensure 5 can go directly to 6 when it needs to, I can wrap 3 in an if statement with the opposite condition of 5 executing. When I do need to go from 5 to 3, I can change the condition while I'm inside 5 so that 3 is executed directly afterwards.

  • At this point I only have one goto, which goes from 3 to 4. If I change that to a break I can exit the one loop and I arrive at the end. To get to 4, I just wrap everything (except 1) in a loop.

You might be able use this trick to break out of nested loops without using goto, if you have any of that going on, but that wasn't necessary in this case.

In the end, I got this code (labels included only for clarity):


1: initialize();
reached4=false;
do5 = false;
while(true){
    if (reached4){
      4: increase(k);
    }
    2: enter_level(k);
    while(true){
      if(do5){
        5:
        increasev2(a[k]);
        if (q != 0) {
          do5 = false;//goto 3
        }
      }
      if(!do5){
        3:
        set(a[k],q);
        if(test() == ok) {
          if (k == n) {
            visit();//goto 6;
          }else{
            reached4 = true;
            break;//goto 4
          }
        }
      }
      6:
      decrease(k);
      if (k==0) {
        7: return;
      }
      set(p,u_k);
      do5 = true;
    }
}
like image 24
Laurel Avatar answered Oct 06 '22 05:10

Laurel


You could use a lot of variables to simulate the flow of the gotos to work with if's and while's

initialize();

enterLevel = true;
executeWhile = true;

do 
{

    if (enterLevel)
    {
        enter_level(k);
    }

    enterLevel = false;

    goto4 = false;
    goto5 = false;
    goto6 = false;

    set(a[k],q);
    if(test() == ok) 
    {
        if (k == n) 
        {
            visit();
            goto6 = true;
        }
        else
        {
            goto4 = true;
        }
    }
    else
    {
        goto5 = true;
    }

    if (goto4) 
    {
        increase(k);
        enterLevel = true;
    }
    else
    {
        do
        {
            if(goto5)
            {
                increasev2(a[k]);
                goto6 = goto5 = !(q != 0); // if (q != 0) { goto6 = goto5 = false; } else { goto6 = goto5 = true; }
            }
            if(goto6)
            {
                decrease(k);
                executeWhile = !(k==0); // if (k == 0) { executeWhile = false; } else { executeWhile = true; }
                set(p,u_k);
                goto5 = true;
            }
        } while (goto5 && executeWhile);
    }
} while (executeWhile);

Whether this version is better than the one with goto's I can't say though.


First, I completely separate all the labels.

Then I identified that are 2 loops going on here:

1 - 
    * label 4 -> goto 2
    * label 5 -> goto 3. 

Both go to top of the code, but one executes enter_level(k) and the other doesn't. That's why the enterLevel var.

2 - 
    * label 6 -> goto 5. This goes up a little in the code, and then executes again. 

Inside this loop, there are 2 situations where it goes out:

    * label 5 -> goto 3. The same as before, but now inside a nested loop
    * label 6 -> goto 7. The way out of the outer loop.

The other variables and if's are just to mantain the control flow.

Yes, I could have used some breaks (and the code could have become shorter), but as the question is about goto's, I personally prefered to not use them.

like image 43
hlscalon Avatar answered Oct 06 '22 03:10

hlscalon