I'm currently developing a very fast algorithm, with one part of it being an extremely fast scanner and statistics function. In this quest, i'm after any performance benefit. Therefore, I'm also interested in keeping the code "multi-thread" friendly.
Now for the question : i've noticed that putting some very frequently accessed variables and arrays into "Global", or "static local" (which does the same), there is a measurable performance benefit (in the range of +10%). I'm trying to understand why, and to find a solution about it, since i would prefer to avoid using these types of allocation. Note that i don't think the difference comes from "allocation", since allocating a few variables and small array on the stack is almost instantaneous. I believe the difference comes from "accessing" and "modifying" data.
In this search, i've found this old post from stackoverflow : C++ performance of global variables
But i'm very disappointed by the answers there. Very little explanation, mostly ranting about "you should not do that" (hey, that's not the question !) and very rough statements like 'it doesn't affect performance', which is obviously incorrect, since i'm measuring it with precise benchmark tools.
As said above, i'm looking for an explanation, and, if it exists, a solution to this issue. So far, i've got the feeling that calculating the memory address of a local (dynamic) variable costs a bit more than a global (or local static). Maybe something like an ADD operation difference. But that doesn't help finding a solution...
Global variables are really slow, in addition to all the other reasons not to use them.
Master C and Embedded C Programming- Learn as you goUsing global variables causes very tight coupling of code. Using global variables causes namespace pollution. This may lead to unnecessarily reassigning a global value.
Coalescing global variables causes variables that are frequently used together to be mapped close together in memory. This strategy improves performance in the same way that changing external variables to static variables does.
Short answer - No, good programmers make code go faster by knowing and using the appropriate tools for the job, and then optimizing in a methodical way where their code does not meet their requirements.
It really depends on your compiler, platform, and other details. However, I can describe one scenario where global variables are faster.
In many cases, a global variable is at a fixed offset. This allows the generated instructions to simply use that address directly. (Something along the lines of MOV AX,[MyVar]
.)
However, if you have a variable that's relative to the current stack pointer or a member of a class or array, some math is required to take the address of the array and determine the address of the actual variable.
Obviously, if you need to place some sort of mutex on your global variable in order to keep it thread-safe, then you'll almost certainly more than lose any performance gain.
Creating local variables can be literally free if they are POD types. You likely are overflowing a cache line with too many stack variables or other similar alignment-based causes which are very specific to your piece of code. I usually find that non-local variables significantly decrease performance.
It's hard to beat static allocation for speed, and while the 10% is a pretty small difference, it could be due to address calculation.
But if you're looking for speed,
your example in a comment while(p<end)stats[*p++]++;
is an obvious candidate for unrolling, such as:
static int stats[M];
static int index_array[N];
int *p = index_array, *pend = p+N;
// ... initialize the arrays ...
while (p < pend-8){
stats[p[0]]++;
stats[p[1]]++;
stats[p[2]]++;
stats[p[3]]++;
stats[p[4]]++;
stats[p[5]]++;
stats[p[6]]++;
stats[p[7]]++;
p += 8;
}
while(p<pend) stats[*p++]++;
Don't count on the compiler to do it for you. It might or might not be able to figure it out.
Other possible optimizations come to mind, but they depend on what you're actually trying to do.
If you have something like
int stats[256]; while (p<end) stats[*p++]++;
static int stats[256]; while (p<end) stats[*p++]++;
you are not really comparing the same thing because for the first instance you are not doing an initialization of your array. Written explicitly the second line is equivalent to
static int stats[256] = { 0 }; while (p<end) stats[*p++]++;
So to be a fair comparison you should have the first read
int stats[256] = { 0 }; while (p<end) stats[*p++]++;
Your compiler might deduce much more things if he has the variables in a known state.
Now then, there could be runtime advantage of the static
case, since the initialization is done at compile time (or program startup).
To test if this makes up for your difference you should run the same function with the static declaration and the loop several times, to see if the difference vanishes if your number of invocations grows.
But as other said already, best is to inspect the assembler that your compiler produces to see what effective difference there are in the code that is produced.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With