I'm porting some code to windows and found threading to be extremely slow. The task takes 300 seconds on windows (with two xeon E5-2670 8 core 2.6ghz = 16 core) and 3.5 seconds on linux (xeon E5-1607 4 core 3ghz). Using vs2012 express.
I've got 32 threads all calling EnterCriticalSection(), popping an 80 byte job of a std::stack, LeaveCriticalSection and doing some work (250k jobs in total).
Before and after every critical section call I print the thread ID and current time.
(roughly same for Debug/Release, Debug takes a little longer. I'd love to be able to properly profile the code :P)
Commenting out the job call makes the whole process take 2 seconds (still more than linux).
I've tried both queryperformancecounter and timeGetTime, both give approx the same result.
AFAIK the job never makes any sync calls, but I can't explain the slowdown unless it does.
I have no idea why copying from a stack and calling pop takes so long. Another very confusing thing is why a call to leave() takes so long.
Can anyone speculate on why it's running so slowly?
I wouldn't have thought the difference in processor would give a 100x performance difference, but could it be at all related to dual CPUs? (having to sync between separate CPUs than internal cores).
By the way, I'm aware of std::thread but want my library code to work with pre C++11.
edit
//in a while(hasJobs) loop...
EVENT qwe1 = {"lock", timeGetTime(), id};
events.push_back(qwe1);
scene->jobMutex.lock();
EVENT qwe2 = {"getjob", timeGetTime(), id};
events.push_back(qwe2);
hasJobs = !scene->jobs.empty();
if (hasJobs)
{
job = scene->jobs.front();
scene->jobs.pop();
}
EVENT qwe3 = {"gotjob", timeGetTime(), id};
events.push_back(qwe3);
scene->jobMutex.unlock();
EVENT qwe4 = {"unlock", timeGetTime(), id};
events.push_back(qwe4);
if (hasJobs)
scene->performJob(job);
and the mutex class, with linux #ifdef stuff removed...
CRITICAL_SECTION mutex;
...
Mutex::Mutex()
{
InitializeCriticalSection(&mutex);
}
Mutex::~Mutex()
{
DeleteCriticalSection(&mutex);
}
void Mutex::lock()
{
EnterCriticalSection(&mutex);
}
void Mutex::unlock()
{
LeaveCriticalSection(&mutex);
}
Window's CRITICAL_SECTION spins in a tight loop when you first enter it. It does not suspend the thread that called EnterCriticalSection unless a substantial period has elapsed in the spin loop. So having 32 threads contending for the same critical section will burn and waste a lot of CPU cycles. Try a mutex instead (see CreateMutex).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With