Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Porting threads to windows. Critical sections are very slow

I'm porting some code to windows and found threading to be extremely slow. The task takes 300 seconds on windows (with two xeon E5-2670 8 core 2.6ghz = 16 core) and 3.5 seconds on linux (xeon E5-1607 4 core 3ghz). Using vs2012 express.

I've got 32 threads all calling EnterCriticalSection(), popping an 80 byte job of a std::stack, LeaveCriticalSection and doing some work (250k jobs in total).

Before and after every critical section call I print the thread ID and current time.

  • The wait time for a single thread's lock is ~160ms
  • To pop the job off the stack takes ~3ms
  • Calling leave takes ~3ms
  • The job takes ~1ms

(roughly same for Debug/Release, Debug takes a little longer. I'd love to be able to properly profile the code :P)

Commenting out the job call makes the whole process take 2 seconds (still more than linux).

I've tried both queryperformancecounter and timeGetTime, both give approx the same result.

AFAIK the job never makes any sync calls, but I can't explain the slowdown unless it does.

I have no idea why copying from a stack and calling pop takes so long. Another very confusing thing is why a call to leave() takes so long.

Can anyone speculate on why it's running so slowly?

I wouldn't have thought the difference in processor would give a 100x performance difference, but could it be at all related to dual CPUs? (having to sync between separate CPUs than internal cores).

By the way, I'm aware of std::thread but want my library code to work with pre C++11.

edit

//in a while(hasJobs) loop...

EVENT qwe1 = {"lock", timeGetTime(), id};
events.push_back(qwe1);

scene->jobMutex.lock();

EVENT qwe2 = {"getjob", timeGetTime(), id};
events.push_back(qwe2);

hasJobs = !scene->jobs.empty();
if (hasJobs)
{
    job = scene->jobs.front();
    scene->jobs.pop();
}

EVENT qwe3 = {"gotjob", timeGetTime(), id};
events.push_back(qwe3);

scene->jobMutex.unlock();

EVENT qwe4 = {"unlock", timeGetTime(), id};
events.push_back(qwe4);

if (hasJobs)
    scene->performJob(job);

and the mutex class, with linux #ifdef stuff removed...

CRITICAL_SECTION mutex;

...

Mutex::Mutex()
{
    InitializeCriticalSection(&mutex);
}
Mutex::~Mutex()
{
    DeleteCriticalSection(&mutex);
}
void Mutex::lock()
{
    EnterCriticalSection(&mutex);
}
void Mutex::unlock()
{
    LeaveCriticalSection(&mutex);
}
like image 494
jozxyqk Avatar asked Nov 11 '22 23:11

jozxyqk


1 Answers

Window's CRITICAL_SECTION spins in a tight loop when you first enter it. It does not suspend the thread that called EnterCriticalSection unless a substantial period has elapsed in the spin loop. So having 32 threads contending for the same critical section will burn and waste a lot of CPU cycles. Try a mutex instead (see CreateMutex).

like image 163
ScottMcP-MVP Avatar answered Nov 15 '22 05:11

ScottMcP-MVP