I have done a sched_setaffinity test in Linux in a server with 1 socket ,4 cores , the following /proc/cpuinfo showes the cpu information :
processor : 0
model name : Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz
cache size : 2048 KB
physical id : 0
siblings : 4
cpu cores : 4
processor : 1
model name : Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz
cache size : 2048 KB
physical id : 0
siblings : 4
cpu cores : 4
processor : 2
model name : Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz
cache size : 2048 KB
physical id : 0
siblings : 4
cpu cores : 4
processor : 3
model name : Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz
cache size : 2048 KB
physical id : 0
siblings : 4
cpu cores : 4
I have a simple test application :
struct foo {
int x;
int y;
} ;
//globar var
volatile struct foo fvar ;
pid_t gettid( void )
{
return syscall( __NR_gettid );
}
void *test_func0(void *arg)
{
int proc_num = (int)(long)arg;
cpu_set_t set;
CPU_ZERO( &set );
CPU_SET( proc_num, &set );
printf("proc_num=(%d)\n",proc_num) ;
if (sched_setaffinity( gettid(), sizeof( cpu_set_t ), &set ))
{
perror( "sched_setaffinity" );
return NULL;
}
int i=0;
for(i=0;i<1000000000;++i){
__sync_fetch_and_add(&fvar.x,1);
}
return NULL;
} //test_func0
compiled : gcc testsync.c -D_GNU_SOURCE -lpthread -o testsync.exe The following is the test results :
2 threads running test_func0 in core 0,1 take 35 secs ;
2 threads running test_func0 in core 0,2 take 55 secs ;
2 threads running test_func0 in core 0,3 take 55 secs ;
2 threads running test_func0 in core 1,2 take 55 secs ;
2 threads running test_func0 in core 1,3 take 55 secs ;
2 threads running test_func0 in core 2,3 take 35 secs ;
I wonder why 2 threads running in core (0,1) or in core(2,3) would be much faster in others ? if I running 2 threads at the same core , like core(1,1) , core(2,2),core(3,3) , that would be take 28 secs , also confused why this happen ?
Cores 0 and 1 share an L2 cache, and so do cores 2 and 3. Running on two cores that share the cache makes the shared variable stay in the L2 cache, which makes things faster.
This is not true in today's Intel processors, where L2 is per core. But on the CPU you're using, this is how it works (it's actually a quad-core CPU made by gluing together two dual-core CPUs).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With