Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tracking uninitialized static variables

I need to debug an ugly and huge math C library, probably once produced by f2c. The code is abusing local static variables, and unfortunately somewhere it seems to exploit the fact that these are automatically initialized to 0. If its entry function is called with the same input twice, it is giving different results. If I unload the library and reload it again, it works correctly. It needs to be fast, so I would like to get rid of the load/unload.

My question is that how to uncover these errors with valgrind or by any other tool without manually walking through the entire code.

I am hunting places where a local static variable is declared, read first, and written only later. The problem is even further complicated by the fact that the static variables are sometimes passed further via pointers (yep - it is so ugly).

I understand that one can argue that mistakes like this should not be necessary detected by an automatic tool, as in some scenarios this is exactly the intended behaviour. Still, is there a way to make the auto-initialized local static variables "dirty"?

like image 784
takbal Avatar asked Oct 19 '11 14:10

takbal


4 Answers

The devil is in the details, but this may work for you:

First, get Frama-C. If you are using Unix, your distribution may have a package. The package won't be the last version but it may be good enough and it will save you some time if you install it this way.

Say your example is as below, only so much bigger that it's not obvious what is wrong:

int add(int x, int y)
{
  static int state;
  int result = x + y + state; // I tested it once and it worked.
  state++;
  return result;
}

Type a command like:

frama-c -lib-entry -main add -deps ugly.c

Options -lib-entry -main add mean "look at function add". Option -deps computes functional dependencies. You'll find these "functional dependencies" in the log:

[from] Function add:
     state FROM state; (and default:false)
     \result FROM x; y; state; (and default:false)

This lists the actual inputs the results of add depend on, and the actual outputs computed from these inputs, including static variables read from and modified. A static variable that was properly initialized before being used would normally not appear as input, unless the analyzer was unable to determine that it was always initialized before being read from.

The log shows state as dependency of \result. If you expected the returned result to depend only on the arguments (meaning two calls with the same arguments produce the same result), it's a hint there may be something wrong here, with the variable state.

Another hint shown in the above lines is that the function modifies state.

This may help or not. Option -lib-entry means that the analyzer does not assume that any non-const static variable has kept its value at the time the function under analysis is called, so that may be too imprecise for your code. There are ways around that, but then it is up to you whether you want to gamble the time it takes to learn these ways.

EDIT: here is a more complex example:

void initialize_1(int *p)
{
  *p = 0;
}

void initialize_2(int *p)
{
  *p; // I made a mistake here.
}

int add(int x, int y)
{
  static int state1;
  static int state2;

  initialize_1(&state1);
  initialize_2(&state2);

  // This is safe because I have initialized state1 and state2:
  int result = x + y + state1 + state2; 

  state1++;
  state2++;
  return result;
}

On this example, the same command produces the results:

[from] Function initialize_1:
         state1 FROM p
[from] Function initialize_2:
[from] Function add:
         state1 FROM \nothing
         state2 FROM state2
         \result FROM x; y; state2

What you see for initialize_2 is an empty list of dependencies, meaning the function assigns nothing. I will make this case clearer by displaying an explicit message rather than just an empty list. If you know what any of the functions initialize_1, initialize_2 or add is supposed to do, you can compare this a priori knowledge to the results of the analysis and see that something is wrong for initialize_2 and add.

SECOND EDIT: and now my example shows something strange for initialize_1, so perhaps I should explain that. Variable state1 depends on p in the sense that p is used to write to state1, and if p had been different, then the final value of state1 would have been different. Here is a last example:

int t[10];

void initialize_index(int i)
{
  t[i] = 1;
}

int main(int argc, char **argv)
{
  initialize_index(argv[1][0]-'0');
}

With the command frama-c -deps t.c, the dependencies computed for initialize_index are:

[from] Function initialize_index:
         t[0..9] FROM i (and SELF)

This means that each of the cells depends on i (it may be modified if i is the index of that particular cell). Each cell may also keep its value (if i indicates another cell): this is indicated with the (and SELF) mention in the latest version, and was indicated with a more obscure (and default:true) in previous versions.

like image 88
Pascal Cuoq Avatar answered Sep 23 '22 06:09

Pascal Cuoq


Static code analysis tools are pretty good at finding typical programming errors like the use of uninitialized variables. Here is a list of free tools that do this for C.

Unfortunately I can't recommend any of the tools in the list. I am only familiar with two commercial products, Coverity and Klocwork. Coverity is very good (and expensive). Klocwork is so so (but less expensive).

like image 28
Miguel Avatar answered Sep 19 '22 06:09

Miguel


What I did in the end is removed all static qualifiers from the code by '#define static'. This turns uninitialised static usage into invalid use, and the type of abuse I am hunting can be uncovered by the tools.

In my actual case this was enough to determine the place of the bug, but in a more general situation it should be refined if static's are actually doing something important, by gradually re-adding 'static' when the code fails to continue.

like image 29
takbal Avatar answered Sep 21 '22 06:09

takbal


I don't know of any library that does this for you, but I would look into using regular expressions to find them. Something like

rgrep "static\s*int" path/to/src/root | grep -v = | grep -v "("

That should return all static int variables declared without an equals sign, and the last pipe should remove anything with parenthesis in them (getting rid of funcions). There's a good change that this won't work exactly for you, but playing around with grep may be the fastest way for you to track this down.

Of course, once you find one that works you can replace int with all of the other kinds of variables to search for those too. HTH

like image 28
dbeer Avatar answered Sep 19 '22 06:09

dbeer