I've got a quite program to show the performance of 2 similar programs, both uses 2 threads to do calculation. The core difference is that one uses a global variable, another uses a "new" object, as below:
#include<pthread.h>
#include<stdlib.h>
struct M{
long a;
long b;
}obj;
size_t count=2000000000;
void* addx(void*args){
long*pl=(long*)args;
for(size_t i=0;i<count;++i)
(*pl)*=i;
return NULL;
}
int main(int argc,char*argv[]){
pthread_t tid[2];
pthread_create(&tid[0],NULL,addx,&obj.a);
pthread_create(&tid[1],NULL,addx,&obj.b);
pthread_join(tid[0],NULL);
pthread_join(tid[1],NULL);
return 0;
}
clang++ test03_threads.cpp -o test03_threads -lpthread -O2 && time ./test03_threads
real 0m3.626s
user 0m6.595s
sys 0m0.009s
It's quite slow, then I modified obj to be dynamically created(I expected it to be even slower):
#include<pthread.h>
#include<stdlib.h>
struct M{
long a;
long b;
}*obj;//difference 1
size_t count=2000000000;
void* addx(void*args){
long*pl=(long*)args;
for(size_t i=0;i<count;++i)
(*pl)*=i;
return NULL;
}
int main(int argc,char*argv[]){
obj=new M;//difference 2
pthread_t tid[2];
pthread_create(&tid[0],NULL,addx,&obj->a);//difference 3
pthread_create(&tid[1],NULL,addx,&obj->b);//difference 4
pthread_join(tid[0],NULL);
pthread_join(tid[1],NULL);
delete obj;//difference 5
return 0;
}
clang++ test03_threads_new.cpp -o test03_threads_new -lpthread -O2 && time ./test03_threads_new
real 0m1.880s
user 0m3.745s
sys 0m0.007s
It's amazingly 100% faster than the previous one. I also tried g++ on linux, same result. But how to explain this? I know obj is global variable, but *obj is still global variable, just dynamically created. What's the core difference?
Global variables are really slow, in addition to all the other reasons not to use them.
The thing about local variables is that the compiler optimizes them to be allocated from the registers if possible, or from the cache if not. This is why local variables are faster.
Well, using global variables does not impact CPU performance directly.
A static variable can be either a global or local variable. Both are created by preceding the variable declaration with the keyword static. A local static variable is a variable that can maintain its value from one function call to another and it will exist until the program ends.
I think that this is indeed because of false sharing, as Unimportant suggested.
Why the difference then, you may ask?
Because of the count
variable! As this is a variable, and size_t
's underlying type happens to be long
for you, the compiler cannot optimize it away (because pl
could point to count
). If count
would be an int
, because of strict aliasing rules, the compiler can optimize it away (or simply it could be const size_t
).
So the generated code has to read count
every time in the loop.
In the first example, count
and obj
both global variables, they are placed near to each other. So, there is a high possibility that the linker put these variables into the same cache line. So writing to obj.a
or obj.b
will invalidate the cache line of count
. So the CPU has to synchronize the reads of count
.
In the second example, obj
is allocated on the heap, it's address will be far enough from count
, so they won't occupy the same cache line. No synchronization needed for count
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With