I have performance issue with boost:barrier. I measure time of wait method call, for single thread situation when call to wait is repeated around 100000 it takes around 0.5 sec. Unfortunately for two thread scenario this time expands to 3 seconds and it is getting worse with every thread ( I have 8 core processor).
I implemented custom method which is responsible for providing the same functionality and it is much more faster.
Is it normal to work so slow for this method. Is there faster way to synchronize threads in boost (so all threads wait for completion of current job by all threads and then proceed to the next task, just synchronization, no data transmission is required).
I have been asked for my current code. What I want to achieve. In a loop I run a function, this function can be divided into many threads, however all thread should finish current loop run before execution of another run.
My current solution
volatile int barrierCounter1 =0; //it will store number of threads which completed current loop run
volatile bool barrierThread1[NumberOfThreads]; //it will store go signal for all threads with id > 0. All values are set to false at the beginning
boost::mutex mutexSetBarrierCounter; //mutex for barrierCounter1 modification
void ProcessT(int threadId)
{
do
{
DoWork(); //function which should be executed by every thread
mutexSetBarrierCounter.lock();
barrierCounter1++; //every thread notifies that it finish execution of function
mutexSetBarrierCounter.unlock();
if(threadId == 0)
{
//main thread (0) awaits for completion of all threads
while(barrierCounter1!=NumberOfThreads)
{
//I assume that the number of threads is lower than the number of processor cores
//so this loop should not have an impact of overall performance
}
//if all threads completed, notify other thread that they can proceed to the consecutive loop
for(int i = 0; i<NumberOfThreads; i++)
{
barrierThread1[i] = true;
}
//clear counter, no lock is utilized because rest of threads await in else loop
barrierCounter1 = 0;
}
else
{
//rest of threads await for "go" signal
while(barrierThread1[i]==false)
{
}
//if thread is allowed to proceed then it should only clean up its barrier thread array
//no lock is utilized because '0' thread would not modify this value until all threads complete loop run
barrierThread1[i] = false;
}
}
while(!end)
}
Locking runs counter to concurrency. Lock contention is always worst behaviour.
IOW: Thread synchronization (in itself) never scales.
Solution: only use synchronization primitives in situations where the contention will be low (the threads need to synchronize "relatively rarely"[1]), or do not try to employ more than one thread for the job that contends for the shared resource.
Your benchmark seems to magnify the very worst-case behavior, by making all threads always wait. If you have a significant workload on all workers between barriers, then the overhead will dwindle, and could easily become insignificant.
[1] Which is highly relative and subjective
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With