Global Variables performance effect (c, c++)

Tags:

I'm currently developing a very fast algorithm, with one part of it being an extremely fast scanner and statistics function. In this quest, i'm after any performance benefit. Therefore, I'm also interested in keeping the code "multi-thread" friendly.

Now for the question : i've noticed that putting some very frequently accessed variables and arrays into "Global", or "static local" (which does the same), there is a measurable performance benefit (in the range of +10%). I'm trying to understand why, and to find a solution about it, since i would prefer to avoid using these types of allocation. Note that i don't think the difference comes from "allocation", since allocating a few variables and small array on the stack is almost instantaneous. I believe the difference comes from "accessing" and "modifying" data.

In this search, i've found this old post from stackoverflow : C++ performance of global variables

But i'm very disappointed by the answers there. Very little explanation, mostly ranting about "you should not do that" (hey, that's not the question !) and very rough statements like 'it doesn't affect performance', which is obviously incorrect, since i'm measuring it with precise benchmark tools.

As said above, i'm looking for an explanation, and, if it exists, a solution to this issue. So far, i've got the feeling that calculating the memory address of a local (dynamic) variable costs a bit more than a global (or local static). Maybe something like an ADD operation difference. But that doesn't help finding a solution...

963

asked Mar 06 '11 13:03

Cyan

4 Answers

It really depends on your compiler, platform, and other details. However, I can describe one scenario where global variables are faster.

In many cases, a global variable is at a fixed offset. This allows the generated instructions to simply use that address directly. (Something along the lines of MOV AX,[MyVar].)

However, if you have a variable that's relative to the current stack pointer or a member of a class or array, some math is required to take the address of the array and determine the address of the actual variable.

Obviously, if you need to place some sort of mutex on your global variable in order to keep it thread-safe, then you'll almost certainly more than lose any performance gain.

answered Oct 25 '22 16:10

Jonathan Wood

Creating local variables can be literally free if they are POD types. You likely are overflowing a cache line with too many stack variables or other similar alignment-based causes which are very specific to your piece of code. I usually find that non-local variables significantly decrease performance.

answered Oct 25 '22 17:10

Puppy

It's hard to beat static allocation for speed, and while the 10% is a pretty small difference, it could be due to address calculation.

But if you're looking for speed, your example in a comment while(p<end)stats[*p++]++; is an obvious candidate for unrolling, such as:

static int stats[M];
static int index_array[N];
int *p = index_array, *pend = p+N;
// ... initialize the arrays ...
while (p < pend-8){
  stats[p[0]]++;
  stats[p[1]]++;
  stats[p[2]]++;
  stats[p[3]]++;
  stats[p[4]]++;
  stats[p[5]]++;
  stats[p[6]]++;
  stats[p[7]]++;
  p += 8;
}
while(p<pend) stats[*p++]++;

Don't count on the compiler to do it for you. It might or might not be able to figure it out.

Other possible optimizations come to mind, but they depend on what you're actually trying to do.

answered Oct 25 '22 18:10

Mike Dunlavey

If you have something like

int stats[256]; while (p<end) stats[*p++]++;

static int stats[256]; while (p<end) stats[*p++]++;

you are not really comparing the same thing because for the first instance you are not doing an initialization of your array. Written explicitly the second line is equivalent to

static int stats[256] = { 0 }; while (p<end) stats[*p++]++;

So to be a fair comparison you should have the first read

 int stats[256] = { 0 }; while (p<end) stats[*p++]++;

Your compiler might deduce much more things if he has the variables in a known state.

Now then, there could be runtime advantage of the static case, since the initialization is done at compile time (or program startup).

To test if this makes up for your difference you should run the same function with the static declaration and the loop several times, to see if the difference vanishes if your number of invocations grows.

But as other said already, best is to inspect the assembler that your compiler produces to see what effective difference there are in the code that is produced.

answered Oct 25 '22 17:10

Jens Gustedt

Related questions
                            
                                Need C compiler options to create to easy-to-reverse executable to teach reversing
                            
                                Looking for a way to force a short read in linux
                            
                                How does this code calculate pi with high precision? [closed]
                            
                                Can you Yield and Resume Luajit coroutines from anywhere in C?
                            
                                GNU C native vectors: how to broadcast a scalar, like x86's _mm_set1_epi16
                            
                                Confusion over ioctl() and kernel headers
                            
                                Profiling a Single Function Predictably
                            
                                How to avoid memory leak with CTFontCreateWithGraphicsFont?
                            
                                C: dup and close-on-exec
                            
                                Algorithm of Minimum steps to transform a list to the desired array. (Using InsertAt and DeleteAt Only)
                            
                                Switch between different GCC versions
                            
                                Subfolders in /usr/local/lib?
                            
                                Implement Double Buffer in C
                            
                                Python C API - Is it thread safe?
                            
                                C - Output explanation of printf("%d %d\n",k=1,k=3); [duplicate]
                            
                                rand() function in C is not random even when seeded
                            
                                Split a string and print out each word
                            
                                How to setup one shared OpenGL contexts per thread with SDL2?
                            
                                Saving a stream while playing it using LibVLC
                            
                                Why doesn't this code scale linearly?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Global Variables performance effect (c, c++)

Tags:

performance

c

static

benchmarking

global