I've got a quite program to show the performance of 2 similar programs, both uses 2 threads to do calculation. The core difference is that one uses a global variable, another uses a "new" object, as below: <pre class="prettyprint"><code>#include<pthread.h> #include<stdlib.h> struct M{ long a; long b; }obj; size_t count=2000000000; void* addx(void*args){ long*pl=(long*)args; for(size_t i=0;i<count;++i) (*pl)*=i; return NULL; } int main(int argc,char*argv[]){ pthread_t tid[2]; pthread_create(&tid[0],NULL,addx,&obj.a); pthread_create(&tid[1],NULL,addx,&obj.b); pthread_join(tid[0],NULL); pthread_join(tid[1],NULL); return 0; } clang++ test03_threads.cpp -o test03_threads -lpthread -O2 && time ./test03_threads real 0m3.626s user 0m6.595s sys 0m0.009s </code></pre> It's quite slow, then I modified obj to be dynamically created(I expected it to be even slower): <pre class="prettyprint"><code>#include<pthread.h> #include<stdlib.h> struct M{ long a; long b; }*obj;//difference 1 size_t count=2000000000; void* addx(void*args){ long*pl=(long*)args; for(size_t i=0;i<count;++i) (*pl)*=i; return NULL; } int main(int argc,char*argv[]){ obj=new M;//difference 2 pthread_t tid[2]; pthread_create(&tid[0],NULL,addx,&obj->a);//difference 3 pthread_create(&tid[1],NULL,addx,&obj->b);//difference 4 pthread_join(tid[0],NULL); pthread_join(tid[1],NULL); delete obj;//difference 5 return 0; } clang++ test03_threads_new.cpp -o test03_threads_new -lpthread -O2 && time ./test03_threads_new real 0m1.880s user 0m3.745s sys 0m0.007s </code></pre> It's amazingly 100% faster than the previous one. I also tried g++ on linux, same result. But how to explain this? I know obj is global variable, but *obj is still global variable, just dynamically created. What's the core difference?

I think that this is indeed because of false sharing, as Unimportant suggested. Why the difference then, you may ask? Because of the <code>count</code> variable! As this is a variable, and <code>size_t</code>'s underlying type happens to be <code>long</code> for you, the compiler cannot optimize it away (because <code>pl</code> could point to <code>count</code>). If <code>count</code> would be an <code>int</code>, because of strict aliasing rules, the compiler can optimize it away (or simply it could be <code>const size_t</code>). So the generated code has to read <code>count</code> every time in the loop. In the first example, <code>count</code> and <code>obj</code> both global variables, they are placed near to each other. So, there is a high possibility that the linker put these variables into the same cache line. So writing to <code>obj.a</code> or <code>obj.b</code> will invalidate the cache line of <code>count</code>. So the CPU has to synchronize the reads of <code>count</code>. In the second example, <code>obj</code> is allocated on the heap, it's address will be far enough from <code>count</code>, so they won't occupy the same cache line. No synchronization needed for <code>count</code>.

C++ using global variable shows 100% slower than a pointer, when using pthread?

I've got a quite program to show the performance of 2 similar programs, both uses 2 threads to do calculation. The core difference is that one uses a global variable, another uses a "new" object, as below:

#include<pthread.h>
#include<stdlib.h>
struct M{
    long a;
    long b;
}obj;
size_t count=2000000000;
void* addx(void*args){
    long*pl=(long*)args;
    for(size_t i=0;i<count;++i)
        (*pl)*=i;
    return NULL;
}
int main(int argc,char*argv[]){
    pthread_t tid[2];
    pthread_create(&tid[0],NULL,addx,&obj.a);
    pthread_create(&tid[1],NULL,addx,&obj.b);
    pthread_join(tid[0],NULL);
    pthread_join(tid[1],NULL);
    return 0;
}

clang++ test03_threads.cpp -o test03_threads -lpthread -O2 && time ./test03_threads

real    0m3.626s
user    0m6.595s
sys 0m0.009s

It's quite slow, then I modified obj to be dynamically created(I expected it to be even slower):

#include<pthread.h>
#include<stdlib.h>
struct M{
    long a;
    long b;
}*obj;//difference 1
size_t count=2000000000;
void* addx(void*args){
    long*pl=(long*)args;
    for(size_t i=0;i<count;++i)
        (*pl)*=i;
    return NULL;
}
int main(int argc,char*argv[]){
    obj=new M;//difference 2
    pthread_t tid[2];
    pthread_create(&tid[0],NULL,addx,&obj->a);//difference 3
    pthread_create(&tid[1],NULL,addx,&obj->b);//difference 4
    pthread_join(tid[0],NULL);
    pthread_join(tid[1],NULL);
    delete obj;//difference 5
    return 0;
}

clang++ test03_threads_new.cpp -o test03_threads_new -lpthread -O2 && time ./test03_threads_new

real    0m1.880s
user    0m3.745s
sys 0m0.007s

It's amazingly 100% faster than the previous one. I also tried g++ on linux, same result. But how to explain this? I know obj is global variable, but *obj is still global variable, just dynamically created. What's the core difference?

Are global variables slower?

Global variables are really slow, in addition to all the other reasons not to use them.

Why local variables are faster than global variables?

The thing about local variables is that the compiler optimizes them to be allocated from the registers if possible, or from the cache if not. This is why local variables are faster.

Do global variables increase compile time?

Well, using global variables does not impact CPU performance directly.

Are globals static?

A static variable can be either a global or local variable. Both are created by preceding the variable declaration with the keyword static. A local static variable is a variable that can maintain its value from one function call to another and it will exist until the program ends.

I think that this is indeed because of false sharing, as Unimportant suggested.

Why the difference then, you may ask?

Because of the count variable! As this is a variable, and size_t's underlying type happens to be long for you, the compiler cannot optimize it away (because pl could point to count). If count would be an int, because of strict aliasing rules, the compiler can optimize it away (or simply it could be const size_t).

So the generated code has to read count every time in the loop.

In the first example, count and obj both global variables, they are placed near to each other. So, there is a high possibility that the linker put these variables into the same cache line. So writing to obj.a or obj.b will invalidate the cache line of count. So the CPU has to synchronize the reads of count.

In the second example, obj is allocated on the heap, it's address will be far enough from count, so they won't occupy the same cache line. No synchronization needed for count.

C++ using global variable shows 100% slower than a pointer, when using pthread?

Tags:

c++

performance

variables

linux

pthreads

Troskyvs

People also ask

1 Answers

geza

Recent Activity

Donate For Us

C++ using global variable shows 100% slower than a pointer, when using pthread?

Tags:

c++

performance

variables

linux

pthreads

Troskyvs

People also ask

1 Answers

geza

Related questions

Recent Activity

Donate For Us