Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Low performance of boost::barrier, wait operation

I have performance issue with boost:barrier. I measure time of wait method call, for single thread situation when call to wait is repeated around 100000 it takes around 0.5 sec. Unfortunately for two thread scenario this time expands to 3 seconds and it is getting worse with every thread ( I have 8 core processor).

I implemented custom method which is responsible for providing the same functionality and it is much more faster.

Is it normal to work so slow for this method. Is there faster way to synchronize threads in boost (so all threads wait for completion of current job by all threads and then proceed to the next task, just synchronization, no data transmission is required).

I have been asked for my current code. What I want to achieve. In a loop I run a function, this function can be divided into many threads, however all thread should finish current loop run before execution of another run.

My current solution

volatile int barrierCounter1 =0; //it will store number of threads which completed current loop run
volatile bool barrierThread1[NumberOfThreads]; //it will store go signal for all threads with id > 0. All values are set to false at the beginning
boost::mutex mutexSetBarrierCounter; //mutex for barrierCounter1 modification

void ProcessT(int threadId)
{
    do
    {
      DoWork(); //function which should be executed by every thread

      mutexSetBarrierCounter.lock();
      barrierCounter1++;  //every thread notifies that it finish execution of function
      mutexSetBarrierCounter.unlock();

      if(threadId == 0)
      {
        //main thread (0) awaits for completion of all threads
        while(barrierCounter1!=NumberOfThreads)
        {
        //I assume that the number of threads is lower than the number of processor cores
        //so this loop should not have an impact of overall performance
        }
        //if all threads completed, notify other thread that they can proceed to the consecutive loop
        for(int i = 0; i<NumberOfThreads; i++)
        {
          barrierThread1[i] = true;
        }
        //clear counter, no lock is utilized because rest of threads await in else loop
        barrierCounter1 = 0;
      }
      else
      {
      //rest of threads await for "go" signal
        while(barrierThread1[i]==false)
        {

        }
        //if thread is allowed to proceed then it should only clean up its barrier thread array
        //no lock is utilized because '0' thread would not modify this value until all threads complete loop run
        barrierThread1[i] = false;
      }
}
while(!end)
}
like image 687
Darqer Avatar asked Dec 14 '25 13:12

Darqer


1 Answers

Locking runs counter to concurrency. Lock contention is always worst behaviour.

IOW: Thread synchronization (in itself) never scales.

Solution: only use synchronization primitives in situations where the contention will be low (the threads need to synchronize "relatively rarely"[1]), or do not try to employ more than one thread for the job that contends for the shared resource.

Your benchmark seems to magnify the very worst-case behavior, by making all threads always wait. If you have a significant workload on all workers between barriers, then the overhead will dwindle, and could easily become insignificant.

  • Trust you profiler
  • Profile only your application code (no silly synthetic benchmarks)
  • Prefer non-threading to threading (remember: asynchrony != concurrency)

[1] Which is highly relative and subjective

like image 111
sehe Avatar answered Dec 17 '25 01:12

sehe