I've been told several times, that I should use std::async
for fire & forget type of tasks with the std::launch::async
parameter (so it does it's magic on a new thread of execution preferably).
Encouraged by these statements, I wanted to see how std::async
is compared to:
std::thread
My naive async implementation looks like this:
template <typename F, typename... Args>
auto myAsync(F&& f, Args&&... args) -> std::future<decltype(f(args...))>
{
std::packaged_task<decltype(f(args...))()> task(std::bind(std::forward<F>(f), std::forward<Args>(args)...));
auto future = task.get_future();
std::thread thread(std::move(task));
thread.detach();
return future;
}
Nothing fancy here, packs the functor f
into an std::packaged task
along with its arguments, launches it on a new std::thread
which is detached, and returns with the std::future
from the task.
And now the code measuring execution time with std::chrono::high_resolution_clock
:
int main(void)
{
constexpr unsigned short TIMES = 1000;
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < TIMES; ++i)
{
someTask();
}
auto dur = std::chrono::high_resolution_clock::now() - start;
auto tstart = std::chrono::high_resolution_clock::now();
for (int i = 0; i < TIMES; ++i)
{
std::thread t(someTask);
t.detach();
}
auto tdur = std::chrono::high_resolution_clock::now() - tstart;
std::future<void> f;
auto astart = std::chrono::high_resolution_clock::now();
for (int i = 0; i < TIMES; ++i)
{
f = std::async(std::launch::async, someTask);
}
auto adur = std::chrono::high_resolution_clock::now() - astart;
auto mastart = std::chrono::high_resolution_clock::now();
for (int i = 0; i < TIMES; ++i)
{
f = myAsync(someTask);
}
auto madur = std::chrono::high_resolution_clock::now() - mastart;
std::cout << "Simple: " << std::chrono::duration_cast<std::chrono::microseconds>(dur).count() <<
std::endl << "Threaded: " << std::chrono::duration_cast<std::chrono::microseconds>(tdur).count() <<
std::endl << "std::sync: " << std::chrono::duration_cast<std::chrono::microseconds>(adur).count() <<
std::endl << "My async: " << std::chrono::duration_cast<std::chrono::microseconds>(madur).count() << std::endl;
return EXIT_SUCCESS;
}
Where someTask()
is a simple method, where I wait a little, simulating some work done:
void someTask()
{
std::this_thread::sleep_for(std::chrono::milliseconds(1));
}
Finally, my results:
Could anyone explain these results? It seems like std::aysnc
is much slower than my naive implementation, or just plain and simple detached std::thread
s. Why is that? After these results is there any reason to use std::async
?
(Note that I did this benchmark with clang++ and g++ too, and the results were very similar)
UPDATE:
After reading Dave S's answer I updated my little benchmark as follows:
std::future<void> f[TIMES];
auto astart = std::chrono::high_resolution_clock::now();
for (int i = 0; i < TIMES; ++i)
{
f[i] = std::async(std::launch::async, someTask);
}
auto adur = std::chrono::high_resolution_clock::now() - astart;
So the std::future
s are now not destroyed - and thus joined - every run. After this change in the code, std::async
produces similar results to my implementation & detached std::thread
s.
Example# std::async is also able to make threads. Compared to std::thread it is considered less powerful but easier to use when you just want to run a function asynchronously.
So if you want to make sure that the work is done asynchronously, use std::launch::async . @user2485710 it needs to block when you retrieve the result, if you need the result in the launching thread. It cannot use the result if the result is not ready. So if you go to get the result, you have to wait until it is ready.
One nice thing about std::async is that it manages a thread pool under the hood. So there is no worry that every time we invoke std::async a new thread is launched.
The function template async runs the function f asynchronously (potentially in a separate thread which might be a part of a thread pool) and returns a std::future that will eventually hold the result of that function call. 1) Behaves as if (2) is called with policy being std::launch::async | std::launch::deferred.
One key difference is that the future returned by async joins the thread when the future is destroyed, or in your case, replaced with a new value.
This means it has to execute someTask()
and join the thread, both of which take time. None of your other tests are doing that, where they simply spawn them independently.
sts::async
returns a special std::future
. This future has a ~future
that does a .wait()
.
So your examples are fundamentally different. The slow ones actually do the tasks during your timing. The fast ones just queue up the tasks, and forget how to ever know the task is done. As the behaviour of programs that let threads last past the end of main is unpredictable, one should avoid it.
The right way to compare the tasks is to store the resulting future
when genersting, and before the timer ends either .wait()
/.join()
them all, or avoid destroying the objects until after the timer expires. This last case, however, makes the sewuential version look worse than it is.
You do need to join/wait before starting the next test, as otherwise you are stealing resources from their timing.
Note that moved futures remove the wait from the source.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With