Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

does fetching values into local variables allow greater optimization in C/C++?

I've often wondered this. Suppose you've got a loop in a function or method, like:

for (int i = 0; i < count; ++i)
    ...do stuff including function/method calls...;

Suppose count is some variable external to the function – a global variable, a data member of the object performing the loop, whatever. Not a local variable. It seems to me that the compiler doesn't know, in general, whether count might change in value across the work that the loop does, and therefore it has to re-fetch the value of count each time through the loop, from whatever memory location it lives in; it can't, for example, be kept in a register. If you instead wrote:

int local_count = count;

for (int i = 0; i < local_count; ++i)
    ...do stuff including function/method calls...;

then you're telling the compiler "the value of count should be cached in a local variable and used, even if the original value were to change". That would allow it to be, for example, placed in a register for the duration of the loop.

Assume that the value of count does not, in fact, change for the duration of the loop. Does this difference in coding style make a performance difference or not, with a typical modern compiler? How smart at compilers at figuring out "this value won't change and doesn't need to be re-fetched"? Note I'm not asking about volatile, which I understand; I'm asking about the possibility that the do stuff section of the code changes the value of count (which I know doesn't happen, but the compiler perhaps does not know).

This comes up even in simpler situations like:

use non-local variable foo;
...do stuff...;
use non-local variable foo again;

Assume that I know the value of foo doesn't change. Should I still cache its value in a local variable for maximal performance? Like:

auto local_foo = foo;

use local_foo;
...do stuff...;
use local_foo again;

I imagine this will depend on the compiler, but I'm hoping some general, useful statements can be made about about how smart/dumb modern compilers tend to be about drawing this inference that a value has not changed, when set to a high optimization level. Does it depend on inlining? Link-time optimization? Other considerations, such as the use of pointers in the the "do stuff" sections that the compiler presumably cannot assume do not point at the external variable in question? Is there any more elegant way of handling this problem than making local-variable copies of stuff all over the place?

(Please don't reply about premature optimization, tell me not to worry about such tiny performance details, etc.; I am asking specifically about the context where every little bit of performance really does matter for the code in question. I work on simulations that take days or weeks to run, often with an extreme hotspot in a short section of code. And please don't tell me I ought to hand-code such performance-sensitive code in assembler if I care so much; I'd love to, but my software has to run cross-platform on end-user machines that might be macOS, Linux, or Windows, so assembly is a non-starter, and indeed I don't know what compiler I'll be on. But I really do need to squeeze maximum performance out of the compiler, to the extent possible.)

like image 616
bhaller Avatar asked Dec 06 '25 01:12

bhaller


1 Answers

The state of optimization in the big three compilers is close to ideal in regard to value lifetime analysis. That is, if you write code that uses some object x (here x means some lvalue expression; it may be an external variable or an array element or a pointed-to object) or you write semantically equivalent code that defines a local const T t = x; and then uses t, then the compilers will largely generate equivalent code.

A corollary to that is that, since we expect the generated code to be equivalent, we should prefer the code that is easier for humans to read and work with.

However, the question is not strictly about semantically equivalent code. If a routine has const T t = x; and then calls some routine not visible in the current translation unit, the compiler might not be able to determine that x does not change during the call, whereas it can see that t does not change. Or there can even be code in the routine that is not equivalent. For example, if a routine has two parameters int *a and int *b, and x is a[3], then b[2] = 7; b[3] = a[3]; and b[2] = 7; b[3] = t; are not equivalent because b[2] = 7; might change x (a[3]), but it cannot change t.

Now, suppose you know that x does not change during execution of the routine, at least not as long as the routine is called the way it is supposed to be called. Then, by defining const T t = x; and thereafter using t in place of x, you are giving the compiler extra information. You are telling the compiler, “This value will not change during execution of this routine.” The compiler can see that t does not change whereas it might not have been able to see that x does not change. Adding information to an optimizer should only enable it to make better decisions. It can enable optimizations that might not otherwise be allowed.

Of course, optimizers are not always perfect. We still use some heuristics to try to figure out the best code. But, in general, modern optimizers are very good, and the more information you give them, the better a result you will get. So it is generally preferable to define local variables to “cache” values you use repeatedly.

It is possible this can go too far. If a routine works with a lot of values, then const T t = x; can become an additional value that must be spilled to the stack at some point. Or it could cause some value other than t be spilled to the stack.

So there is no absolute rule that using local variables like this will improve performance (or at least not impair it), but it is generally a good idea.

like image 97
Eric Postpischil Avatar answered Dec 08 '25 15:12

Eric Postpischil