Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ refactoring: conditional expansion and block elimination

I'm in the process of refactoring a very large amount of code, mostly C++, to remove a number of temporary configuration checks which have become permanantly set to given values. So for example, I would have the following code:

#include <value1.h>
#include <value2.h>
#include <value3.h>

...

if ( value1() )
{
    // do something
}

bool b = value2();

if ( b && anotherCondition )
{
    // do more stuff
}

if ( value3() < 10 )
{
    // more stuff again
}

where the calls to value return either a bool or an int. Since I know the values that these calls always return, I've done some regex substitution to expand the calls to their normal values:

// where:
//   value1() == true
//   value2() == false
//   value3() == 4

// TODO: Remove expanded config (value1)
if ( true )
{
    // do something
}

// TODO: Remove expanded config (value2)
bool b = false;

if ( b && anotherCondition )
{
    // do more stuff
}

// TODO: Remove expanded config (value3)
if ( 4 < 10 )
{
    // more stuff again
}

Note that although the values are fixed, they are not set at compile time but are read from shared memory so the compiler is not currently optimising anything away behind the scenes.

Although the resultant code looks a bit goofy, this regex approach achieves a lot of what I want since it's simple to apply and removes dependence on the calls, while not changing the behaviour of the code and it's also likely that the compiler may then optimise a lot of it out knowing that a block can never be called or a check will always return true. It also makes it reasonably easy (especially when diffing against version control) to see what has changed and take the final step of cleaning it up so the code above code eventually looks as follows:

// do something

// DONT do more stuff (b being false always prevented this)

// more stuff again

The trouble is that I have hundreds (possibly thousands) of changes to make to get from the second, correct but goofy, stage to get to the final cleaned code.

I wondered if anyone knew of a refactoring tool which might handle this or of any techniques I could apply. The main problem is that the C++ syntax makes full expansion or elimination quite difficult to achieve and there are many permutations to the code above. I feel I almost need a compiler to deal with the variation of syntax that I would need to cover.

I know there have been similar questions but I can't find any requirement quite like this and also wondered if any tools or procedures had emerged since they were asked?

like image 836
Component 10 Avatar asked Apr 11 '12 08:04

Component 10


2 Answers

It sounds like you have what I call "zombie code"... dead in practice, but still live as far as the compiler is concerned. This is a pretty common issue with most systems of organized runtime configuration variables: eventually some configuration variables arrive at a permanent fixed state, yet are reevaluated at runtime repeatedly.

The cure isn't regex, as you have noted, because regex doesn't parse C++ code reliably. What you need is a program transformation system. This is a tool that really parses source code, and can apply a set of code-to-code rewriting rules to the parse tree, and can regenerate source text from the changed tree.

I understand that Clang has some capability here; it can parse C++ and build a tree, but it does not have source-to-source transformation capability. You can simulate that capability by writing AST-to-AST transformations but that's a lot more inconvenient IMHO. I believe it can regenerate C++ code but I don't know if it will preserve comments or preprocessor directives.

Our DMS Software Reengineering Toolkit with its C++(11) front end can (and has been used to) carry out massive transformations on C++ source code, and has source-to-source transformations. AFAIK, it is the only production tool that can do this. What you need is a set of transformations that represent your knowledge of the final state of the configuration variables of interest, and some straightforward code simplification rules. The following DMS rules are close to what you likely want:

  rule fix_value1():expression->expression
    "value1()" -> "true";
  rule fix_value2():expression->expression
    "value2()" -> "false";
  rule fix_value3():expression->expression
    "value3()" -> "4";

  rule simplify_boolean_and_true(r:relation):condition->condition
     "r && true" -> "r".
  rule simplify_boolean_or_ture(r:relation):condition->condition
     "r || true" -> "true".
  rule simplify_boolean_and_false(r:relation):condition->condition
     "r && false" -> "false".
  ...
  rule simplify_boolean_not_true(r:relation):condition->condition
     "!true" -> "false".
  ...

  rule simplify_if_then_false(s:statement): statement->statement
      " if (false) \s" -> ";";
  rule simplify_if_then_true(s:statement): statement->statement
      " if (true) \s" -> "\s";
  rule simplify_if_then_else_false(s1:statement, s2:statement): statement->statement
      " if (false) \s1 else \s2" -> "\s2";
  rule simplify_if_then_else_true(s1:statement, s2: statement): statement->statement
      " if (true) \s1 else \s2" -> "\s2";

You also need rules to simplify ("fold") constant expressions involving arithmetic, and rules to handle switch on expressions that are now constant. To see what DMS rules look like for integer constant folding see Algebra as a DMS domain.

Unlike regexes, DMS rewrite rules cannot "mismatch" code; they represent the corresponding ASTs and it is that ASTs that are matched. Because it is AST matching, they have no problems with whitespace, line breaks or comments. You might think they could have trouble with order of operands ('what if "false && x" is encountered?'); they do not, as the grammar rules for && and || are marked in the DMS C++ parser as associative and commutative and the matching process automatically takes that into account.

What these rules cannot do by themselves is value (in your case, constant) propagation across assignments. For this you need flow analysis so that you can trace such assignments ("reaching definitions"). Obviously, if you don't have such assignments or very few, you can hand patch those. If you do, you'll need the flow analysis; alas, DMS's C++ front isn't quite there but we are working on it; we have control flow analysis in place. (DMS's C front end has full flow analysis).

(EDIT February 2015: Now does full C++14; flow analysis within functions/methods).

We actually applied this technique to 1.5M SLOC application of mixed C and C++ code from IBM Tivoli almost a decade ago with excellent success; we didn't need the flow analysis :-}

like image 188
Ira Baxter Avatar answered Nov 20 '22 00:11

Ira Baxter


You say:

Note that although the values are reasonably fixed, they are not set at compile time but are read from shared memory so the compiler is not currently optimising anything away behind the scenes.

Constant-folding the values by hand doesn't make a lot of sense unless they are completely fixed. If your compiler provides constexpr you could use that, or you could substitute in preprocessor macros like this:

#define value1() true
#define value2() false
#define value3() 4

The optimizer would take care of you from there. Without seeing examples of exactly what's in your <valueX.h> headers or knowing how your process of getting these values from shared memory is working, I'll just throw out that it could be useful to rename the existing valueX() functions and do a runtime check in case they change again in the future:

// call this at startup to make sure our agreed on values haven't changed
void check_values() {
    assert(value1() == get_value1_from_shared_memory());
    assert(value2() == get_value2_from_shared_memory());
    assert(value3() == get_value3_from_shared_memory());
}
like image 1
HostileFork says dont trust SE Avatar answered Nov 20 '22 00:11

HostileFork says dont trust SE