I recently heard new c++ standard features which are: <ol> <li>std::latch</li> <li>std::barrier</li> </ol> I cannot figure it out ,in which situations that they are applicable and useful over one-another. <ul> <li>If someone can raise an example for how to use each one of them wisely it would be really helpful. </li> </ul>

<h3>Very short answer</h3> They're really aimed at quite different goals: <ul> <li>Barriers are useful when you have a bunch of threads and you want to synchronise across of them at once, for example to do something that operates on all of their data at once.</li> <li>Latches are useful if you have a bunch of work items and you want to know when they've all been handled, and aren't necessarily interested in which thread(s) handled them.</li> </ul> <h3>Much longer answer</h3> Barriers and latches are often used when you have a pool of worker threads that do some processing and a queue of work items that is shared between. It's not the only situation where they're used, but it is a very common one and does help illustrate the differences. Here's some example code that would set up some threads like this: <pre class="prettyprint"><code>const size_t worker_count = 7; // or whatever std::vector<std::thread> workers; std::vector<Proc> procs(worker_count); Queue<std::function<void(Proc&)>> queue; for (size_t i = 0; i < worker_count; ++i) { workers.push_back(std::thread( [p = &procs[i], &queue]() { while (auto fn = queue.pop_back()) { fn(*p); } } )); } </code></pre> There are two types that I have assumed exist in that example: <ul> <li> <code>Proc</code>: a type specific to your application that contains data and logic necessary to process work items. A reference to one is passed to each callback function that's run in the thread pool.</li> <li> <code>Queue</code>: a thread-safe blocking queue. There is nothing like this in the C++ standard library (somewhat surprisingly) but there are a lot of open-source libraries containing them e.g. Folly <code>MPMCQueue</code> or <code>moodycamel::ConcurrentQueue</code>, or you can build a less fancy one yourself with <code>std::mutex</code>, <code>std::condition_variable</code> and <code>std::deque</code> (there are many examples of how to do this if you Google for them).</li> </ul> <h3>Latch</h3> A latch is often used to wait until some work items you push onto the queue have all finished, typically so you can inspect the result. <pre class="prettyprint"><code>std::vector<WorkItem> work = get_work(); std::latch latch(work.size()); for (WorkItem& work_item : work) { queue.push_back([&work_item, &latch](Proc& proc) { proc.do_work(work_item); latch.count_down(); }); } latch.wait(); // Inspect the completed work </code></pre> How this works: <ol> <li>The threads will - eventually - pop the work items off of the queue, possibly with multiple threads in the pool handling different work items at the same time.</li> <li>As each work item is finished, <code>latch.count_down()</code> is called, effectively decrementing an internal counter that started at <code>work.size()</code>.</li> <li>When all work items have finished, that counter reaches zero, at which point <code>latch.wait()</code> returns and the producer thread knows that the work items have all been processed.</li> </ol> Notes: <ul> <li>The latch count is the number of work items that will be processed, not the number of worker threads.</li> <li>The <code>count_down()</code> method could be called zero times, one time, or multiple times on each thread, and that number could be different for different threads. For example, even if you push 7 messages onto 7 threads, it might be that all 7 items are processed onto the same one thread (rather than one for each thread) and that's fine.</li> <li>Other unrelated work items could be interleaved with these ones (e.g. because they weree pushed onto the queue by other producer threads) and again that's fine.</li> <li>In principle, it's possible that <code>latch.wait()</code> won't be called until after all of the worker threads have already finished processing all of the work items. (This is the sort of odd condition you need to look out for when writing threaded code.) But that's OK, it's not a race condition: <code>latch.wait()</code> will just immediately return in that case.</li> <li>An alternative to using a latch is that there's another queue, in addition to the one shown here, that contains the result of the work items. The thread pool callback pushes results on to that queue while the producer thread pops results off of it. Basically, it goes in the opposite direction to the <code>queue</code> in this code. That's a perfectly valid strategy too, in fact if anything it's more common, but there are other situations where the latch is more useful.</li> </ul> <h3>Barrier</h3> A barrier is often used to make all threads wait simultaneously so that the data associated with all of the threads can be operated on simultaneously. <pre class="prettyprint"><code>typedef Fn std::function<void()>; Fn completionFn = [&procs]() { // Do something with the whole vector of Proc objects }; auto barrier = std::make_shared<std::barrier<Fn>>(worker_count, completionFn); auto workerFn = [barrier](Proc&) { barrier->count_down_and_wait(); }; for (size_t i = 0; i < worker_count; ++i) { queue.push_back(workerFn); } </code></pre> How this works: <ol> <li>All of the worker threads will pop one of these <code>workerFn</code> items off of the queue and call <code>barrier.count_down_and_wait()</code>.</li> <li>Once all of them are waiting, one of them will call <code>completionFn()</code> while the others continue to wait.</li> <li>Once that function completes they will all return from <code>count_down_and_wait()</code> and be free to pop other, unrelated, work items from the queue.</li> </ol> Notes: <ul> <li>Here the barrier count is the number of worker threads.</li> <li>It is guaranteed that each thread will pop precisely one <code>workerFn</code> off of the queue and handle it. Once a thread has popped one off of the queue, it will wait in <code>barrier.count_down_and_wait()</code> until all the other copies of <code>workerFn</code> have been popped off by other threads, so there is no chance of it popping another one off.</li> <li>I used a shared pointer to the barrier so that it will be destroyed automatically once all the work items are done. This wasn't an issue with the latch because there we could just make it a local variable in the producer thread function, because it waits until the worker threads have used the latch (it calls <code>latch.wait()</code>). Here the producer thread doesn't wait for the barrier so we need to manage the memory in a different way.</li> <li>If you did want the original producer thread to wait until the barrier has been finished, that's fine, it can call <code>count_down_and_wait()</code> too, but you will obviously need to pass <code>worker_count + 1</code> to the barrier's constructor. (And then you wouldn't need to use a shared pointer for the barrier.)</li> <li>If other work items are being pushed onto the queue at the same time, that's fine too, although it will potentially waste time as some threads will just be sitting there waiting for the barrier to be acquired while other threads are distracted by other work before they acquire the barrier.</li> </ul> !!! DANGER !!! The last bullet point about other working being pushed onto the queue being "fine" is only the case if that other work doesn't also use a barrier! If you have two different producer threads putting work items with a barrier on to the same queue and those items are interleaved, then some threads will wait on one barrier and others on the other one, and neither will ever reach the required wait count - DEADLOCK. One way to avoid this is to only ever use barriers like this from a single thread, or even to only ever use one barrier in your whole program (this sounds extreme but is actually quite a common strategy, as barriers are often used for one-time initialisation on startup). Another option, if the thread queue you're using supports it, is to atomically push all work items for the barrier onto the queue at once so they're never interleaved with any other work items. (This won't work with the <code>moodycamel</code> queue, which supports pushing multiple items at once but doesn't guarantee that they won't be interleved with items pushed on by other threads.) <h3>Barrier without completion function</h3> At the point when you asked this question, the proposed experimental API didn't support completion functions. Even the current API at least allows not using them, so I thought I should show an example of how barriers can be used like that too. <pre class="prettyprint"><code>auto barrier = std::make_shared<std::barrier<>>(worker_count); auto workerMainFn = [&procs, barrier](Proc&) { barrier->count_down_and_wait(); // Do something with the whole vector of Proc objects barrier->count_down_and_wait(); }; auto workerOtherFn = [barrier](Proc&) { barrier->count_down_and_wait(); // Wait for work to start barrier->count_down_and_wait(); // Wait for work to finish } queue.push_back(std::move(workerMainFn)); for (size_t i = 0; i < worker_count - 1; ++i) { queue.push_back(workerOtherFn); } </code></pre> How this works: The key idea is to wait for the barrier twice in each thread, and do the work in between. The first waits have the same purpose as the previous example: they ensure any earlier work items in the queue are finished before starting this work. The second waits ensure that any later items in the queue don't start until this work has finished. Notes: The notes are mostly the same as the previous barrier example, but here are some differences: <ul> <li>One difference is that, because the barrier is not tied to the specific completion function, it's more likely that you can share it between multiple uses, like we did in the latch example, avoiding the use of a shared pointer.</li> <li>This example makes it look like using a barrier without a completion function is much more fiddly, but that's just because this situation isn't well suited to them. Sometimes, all you need is to reach the barrier. For example, whereas we initialised a queue before the threads started, maybe you have a queue for each thread but initialised in the threads' run functions. In that case, maybe the barrier just signifies that the queues have been initialised and are ready for other threads to pass messages to each other. In that case, you can use a barrier with no completion function without needing to wait on it twice like this.</li> <li>You could actually use a latch for this, calling <code>count_down()</code> and then <code>wait()</code> in place of <code>count_down_and_wait()</code>. But using a barrier makes more sense, both because calling the combined function is a little simpler and because using a barrier communicates your intention better to future readers of the code.</li> <li>Any any case, the "DANGER" warning from before still applies.</li> </ul>

Where can we use std::barrier over std::latch?

1 Answers

Very short answer

They're really aimed at quite different goals:

Barriers are useful when you have a bunch of threads and you want to synchronise across of them at once, for example to do something that operates on all of their data at once.
Latches are useful if you have a bunch of work items and you want to know when they've all been handled, and aren't necessarily interested in which thread(s) handled them.

Much longer answer

Barriers and latches are often used when you have a pool of worker threads that do some processing and a queue of work items that is shared between. It's not the only situation where they're used, but it is a very common one and does help illustrate the differences. Here's some example code that would set up some threads like this:

const size_t worker_count = 7; // or whatever
std::vector<std::thread> workers;
std::vector<Proc> procs(worker_count);
Queue<std::function<void(Proc&)>> queue;
for (size_t i = 0; i < worker_count; ++i) {
    workers.push_back(std::thread(
        [p = &procs[i], &queue]() {
            while (auto fn = queue.pop_back()) {
                fn(*p);
            }
        }
    ));
}

There are two types that I have assumed exist in that example:

Proc: a type specific to your application that contains data and logic necessary to process work items. A reference to one is passed to each callback function that's run in the thread pool.
Queue: a thread-safe blocking queue. There is nothing like this in the C++ standard library (somewhat surprisingly) but there are a lot of open-source libraries containing them e.g. Folly MPMCQueue or moodycamel::ConcurrentQueue, or you can build a less fancy one yourself with std::mutex, std::condition_variable and std::deque (there are many examples of how to do this if you Google for them).

Latch

A latch is often used to wait until some work items you push onto the queue have all finished, typically so you can inspect the result.

std::vector<WorkItem> work = get_work();
std::latch latch(work.size());
for (WorkItem& work_item : work) {
    queue.push_back([&work_item, &latch](Proc& proc) {
        proc.do_work(work_item);
        latch.count_down();
    });
}
latch.wait();
// Inspect the completed work

How this works:

The threads will - eventually - pop the work items off of the queue, possibly with multiple threads in the pool handling different work items at the same time.
As each work item is finished, latch.count_down() is called, effectively decrementing an internal counter that started at work.size().
When all work items have finished, that counter reaches zero, at which point latch.wait() returns and the producer thread knows that the work items have all been processed.

Notes:

The latch count is the number of work items that will be processed, not the number of worker threads.
The count_down() method could be called zero times, one time, or multiple times on each thread, and that number could be different for different threads. For example, even if you push 7 messages onto 7 threads, it might be that all 7 items are processed onto the same one thread (rather than one for each thread) and that's fine.
Other unrelated work items could be interleaved with these ones (e.g. because they weree pushed onto the queue by other producer threads) and again that's fine.
In principle, it's possible that latch.wait() won't be called until after all of the worker threads have already finished processing all of the work items. (This is the sort of odd condition you need to look out for when writing threaded code.) But that's OK, it's not a race condition: latch.wait() will just immediately return in that case.
An alternative to using a latch is that there's another queue, in addition to the one shown here, that contains the result of the work items. The thread pool callback pushes results on to that queue while the producer thread pops results off of it. Basically, it goes in the opposite direction to the queue in this code. That's a perfectly valid strategy too, in fact if anything it's more common, but there are other situations where the latch is more useful.

Barrier

A barrier is often used to make all threads wait simultaneously so that the data associated with all of the threads can be operated on simultaneously.

typedef Fn std::function<void()>;
Fn completionFn = [&procs]() {
    // Do something with the whole vector of Proc objects
};
auto barrier = std::make_shared<std::barrier<Fn>>(worker_count, completionFn);
auto workerFn = [barrier](Proc&) {
    barrier->count_down_and_wait();
};
for (size_t i = 0; i < worker_count; ++i) {
    queue.push_back(workerFn);
}

How this works:

All of the worker threads will pop one of these workerFn items off of the queue and call barrier.count_down_and_wait().
Once all of them are waiting, one of them will call completionFn() while the others continue to wait.
Once that function completes they will all return from count_down_and_wait() and be free to pop other, unrelated, work items from the queue.

Notes:

Here the barrier count is the number of worker threads.
It is guaranteed that each thread will pop precisely one workerFn off of the queue and handle it. Once a thread has popped one off of the queue, it will wait in barrier.count_down_and_wait() until all the other copies of workerFn have been popped off by other threads, so there is no chance of it popping another one off.
I used a shared pointer to the barrier so that it will be destroyed automatically once all the work items are done. This wasn't an issue with the latch because there we could just make it a local variable in the producer thread function, because it waits until the worker threads have used the latch (it calls latch.wait()). Here the producer thread doesn't wait for the barrier so we need to manage the memory in a different way.
If you did want the original producer thread to wait until the barrier has been finished, that's fine, it can call count_down_and_wait() too, but you will obviously need to pass worker_count + 1 to the barrier's constructor. (And then you wouldn't need to use a shared pointer for the barrier.)
If other work items are being pushed onto the queue at the same time, that's fine too, although it will potentially waste time as some threads will just be sitting there waiting for the barrier to be acquired while other threads are distracted by other work before they acquire the barrier.

!!! DANGER !!!

The last bullet point about other working being pushed onto the queue being "fine" is only the case if that other work doesn't also use a barrier! If you have two different producer threads putting work items with a barrier on to the same queue and those items are interleaved, then some threads will wait on one barrier and others on the other one, and neither will ever reach the required wait count - DEADLOCK. One way to avoid this is to only ever use barriers like this from a single thread, or even to only ever use one barrier in your whole program (this sounds extreme but is actually quite a common strategy, as barriers are often used for one-time initialisation on startup). Another option, if the thread queue you're using supports it, is to atomically push all work items for the barrier onto the queue at once so they're never interleaved with any other work items. (This won't work with the moodycamel queue, which supports pushing multiple items at once but doesn't guarantee that they won't be interleved with items pushed on by other threads.)

Barrier without completion function

At the point when you asked this question, the proposed experimental API didn't support completion functions. Even the current API at least allows not using them, so I thought I should show an example of how barriers can be used like that too.

auto barrier = std::make_shared<std::barrier<>>(worker_count);
auto workerMainFn = [&procs, barrier](Proc&) {
    barrier->count_down_and_wait();
    // Do something with the whole vector of Proc objects
    barrier->count_down_and_wait();
};
auto workerOtherFn = [barrier](Proc&) {
    barrier->count_down_and_wait();  // Wait for work to start
    barrier->count_down_and_wait();  // Wait for work to finish
}
queue.push_back(std::move(workerMainFn));
for (size_t i = 0; i < worker_count - 1; ++i) {
    queue.push_back(workerOtherFn);
}

How this works:

The key idea is to wait for the barrier twice in each thread, and do the work in between. The first waits have the same purpose as the previous example: they ensure any earlier work items in the queue are finished before starting this work. The second waits ensure that any later items in the queue don't start until this work has finished.

Notes:

The notes are mostly the same as the previous barrier example, but here are some differences:

One difference is that, because the barrier is not tied to the specific completion function, it's more likely that you can share it between multiple uses, like we did in the latch example, avoiding the use of a shared pointer.
This example makes it look like using a barrier without a completion function is much more fiddly, but that's just because this situation isn't well suited to them. Sometimes, all you need is to reach the barrier. For example, whereas we initialised a queue before the threads started, maybe you have a queue for each thread but initialised in the threads' run functions. In that case, maybe the barrier just signifies that the queues have been initialised and are ready for other threads to pass messages to each other. In that case, you can use a barrier with no completion function without needing to wait on it twice like this.
You could actually use a latch for this, calling count_down() and then wait() in place of count_down_and_wait(). But using a barrier makes more sense, both because calling the combined function is a little simpler and because using a barrier communicates your intention better to future readers of the code.
Any any case, the "DANGER" warning from before still applies.

122

answered Oct 11 '22 19:10

Arthur Tacca

Related questions
                            
                                What happens to static variables when libraries are statically linked
                            
                                How can __COUNTER__ cause a ODR-violation here?
                            
                                vector.assign() with value in sequence
                            
                                Questions about three constructors in std::variant's proposed interface
                            
                                Avoid redundant calls to QSortFilterProxyModel::filterAcceptsRow() if the filter has become strictly narrower
                            
                                QLineEdit with QValidator: React to editing finished regardless of input validity?
                            
                                Can't assign a `std::unique_ptr` to a base class in clang when using an alias template
                            
                                Why cannot forward declare a scoped enum?
                            
                                Why do vtables have sizeof(void*) * 2 bytes of 0x00 padding?
                            
                                How to memset an anonymous union with 0
                            
                                Access C++ static methods from C#
                            
                                C++ discourages base class of collection - is there anyway to fake it?
                            
                                Standard library facilities which allocate but don't use an Allocator
                            
                                Suppress delete-non-virtual-dtor warning when using a protected non-virtual destructor
                            
                                How to use something like `std::basic_istream<std::byte>`
                            
                                How to convert 32-bit compiled binary to 64-bit [closed]
                            
                                What are the types of identifiers introduced by structured bindings in C++17?
                            
                                How to embed Chromium Embedded Framework in C++
                            
                                Unable to Change Default Linker in CMake
                            
                                Stepping into a function, but not into the evaluation of the arguments with VS2017

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Where can we use std::barrier over std::latch?

Tags:

c++

std

visual-c++

c++-experimental

Buddhika Chaturanga

People also ask

1 Answers

Very short answer

Much longer answer

Latch

Barrier

Barrier without completion function

Arthur Tacca

Recent Activity

Donate For Us