I was reading Chapter 8 of the "Modern C++ Programming Cookbook, 2nd edition" on concurrency and stumbled upon something that puzzles me. The author implements different versions of parallel map and reduce functions using <code>std::thread</code> and <code>std::async</code>. The implementations are really close; for example, the heart of the <code>parallel_map</code> functions are <pre class="prettyprint"><code>// parallel_map using std::async ... tasks.emplace_back(std::async( std::launch::async, [=, &f] {std::transform(begin, last, begin, std::forward<F>(f)); })); ... // parallel_map using std::thread ... threads.emplace_back([=, &f] {std::transform(begin, last, begin, std::forward<F>(f)); }); ... </code></pre> The complete code can be found here for <code>std::thread</code> and there for <code>std::async</code>. What puzzles me is that the computation times reported in the book give a significant and consistent advantage to the <code>std::async</code> implementation. Moreover, the author acknowledge this fact as being obvious, without providing any hint of justification: <blockquote> If we compare this [result with async] with the results from the parallel version using threads, we will find that these are faster execution times and that the speedup is significant, especially for the <code>fold</code> function. </blockquote> I ran the code above on my computer, and even though the differences are not as compelling as in the book, I find that the <code>std::async</code> implementation is indeed faster than the <code>std::thread</code> one. (The author also later brings in standard implementations of these algorithms, which are even faster). On my computer, the code runs with four threads, which corresponds to the number of physical cores of my CPU. Maybe I missed something, but why is it obvious that <code>std::async</code> should run faster than <code>std::thread</code> on this example? My intuition was that <code>std::async</code> being a higher-level implementation of threads, it should take at least the same amount of time, if not more, than threads -- obviously I was wrong. Are those findings consistent, as suggested by the book, and what is the explanation?

My original interpretation was incorrect. Refer to @OznOg's answer below. Modified Answer: I created a simple benchmark that uses <code>std::async</code> and <code>std::thread</code> to do some tiny tasks: <pre class="prettyprint lang-cpp prettyprint-override"><code>#include <thread> #include <chrono> #include <vector> #include <future> #include <iostream> __thread volatile int you_shall_not_optimize_this; void work() { // This is the simplest way I can think of to prevent the compiler and // operating system from doing naughty things you_shall_not_optimize_this = 42; } [[gnu::noinline]] std::chrono::nanoseconds benchmark_threads(size_t count) { std::vector<std::optional<std::thread>> threads; threads.resize(count); auto before = std::chrono::high_resolution_clock::now(); for (size_t i = 0; i < count; ++i) threads[i] = std::thread { work }; for (size_t i = 0; i < count; ++i) threads[i]->join(); threads.clear(); auto after = std::chrono::high_resolution_clock::now(); return after - before; } [[gnu::noinline]] std::chrono::nanoseconds benchmark_async(size_t count, std::launch policy) { std::vector<std::optional<std::future<void>>> results; results.resize(count); auto before = std::chrono::high_resolution_clock::now(); for (size_t i = 0; i < count; ++i) results[i] = std::async(policy, work); for (size_t i = 0; i < count; ++i) results[i]->wait(); results.clear(); auto after = std::chrono::high_resolution_clock::now(); return after - before; } std::ostream& operator<<(std::ostream& stream, std::launch value) { if (value == std::launch::async) return stream << "std::launch::async"; else if (value == std::launch::deferred) return stream << "std::launch::deferred"; else return stream << "std::launch::unknown"; } // #define CONFIG_THREADS true // #define CONFIG_ITERATIONS 10000 // #define CONFIG_POLICY std::launch::async int main() { std::cout << "Running benchmark:\n" << " threads? " << std::boolalpha << CONFIG_THREADS << '\n' << " iterations " << CONFIG_ITERATIONS << '\n' << " async policy " << CONFIG_POLICY << std::endl; std::chrono::nanoseconds duration; if (CONFIG_THREADS) { duration = benchmark_threads(CONFIG_ITERATIONS); } else { duration = benchmark_async(CONFIG_ITERATIONS, CONFIG_POLICY); } std::cout << "Completed in " << duration.count() << "ns (" << std::chrono::duration_cast<std::chrono::milliseconds>(duration).count() << "ms)\n"; } </code></pre> I've run the benchmark as follows: <pre class="prettyprint lang-none prettyprint-override"><code>$ g++ -Wall -Wextra -std=c++20 -pthread -O3 -DCONFIG_THREADS=false -DCONFIG_ITERATIONS=10000 -DCONFIG_POLICY=std::launch::deferred main.cpp -o main && ./main Running benchmark: threads? false iterations 10000 async policy std::launch::deferred Completed in 4783327ns (4ms) $ g++ -Wall -Wextra -std=c++20 -pthread -O3 -DCONFIG_THREADS=false -DCONFIG_ITERATIONS=10000 -DCONFIG_POLICY=std::launch::async main.cpp -o main && ./main Running benchmark: threads? false iterations 10000 async policy std::launch::async Completed in 301756775ns (301ms) $ g++ -Wall -Wextra -std=c++20 -pthread -O3 -DCONFIG_THREADS=true -DCONFIG_ITERATIONS=10000 -DCONFIG_POLICY=std::launch::deferred main.cpp -o main && ./main Running benchmark: threads? true iterations 10000 async policy std::launch::deferred Completed in 291284997ns (291ms) $ g++ -Wall -Wextra -std=c++20 -pthread -O3 -DCONFIG_THREADS=true -DCONFIG_ITERATIONS=10000 -DCONFIG_POLICY=std::launch::async main.cpp -o main && ./main Running benchmark: threads? true iterations 10000 async policy std::launch::async Completed in 293539858ns (293ms) </code></pre> I re-ran all the benchmarks with <code>strace</code> attached and accumulated the system calls made: <pre class="prettyprint lang-none prettyprint-override"><code># std::async with std::launch::async 1 access 2 arch_prctl 36 brk 10000 clone 6 close 1 execve 1 exit_group 10002 futex 10028 mmap 10009 mprotect 9998 munmap 7 newfstatat 6 openat 7 pread64 1 prlimit64 5 read 2 rt_sigaction 20001 rt_sigprocmask 1 set_robust_list 1 set_tid_address 5 write # std::async with std::launch::deferred 1 access 2 arch_prctl 11 brk 6 close 1 execve 1 exit_group 10002 futex 28 mmap 9 mprotect 2 munmap 7 newfstatat 6 openat 7 pread64 1 prlimit64 5 read 2 rt_sigaction 1 rt_sigprocmask 1 set_robust_list 1 set_tid_address 5 write # std::thread with std::launch::async 1 access 2 arch_prctl 27 brk 10000 clone 6 close 1 execve 1 exit_group 2 futex 10028 mmap 10009 mprotect 9998 munmap 7 newfstatat 6 openat 7 pread64 1 prlimit64 5 read 2 rt_sigaction 20001 rt_sigprocmask 1 set_robust_list 1 set_tid_address 5 write # std::thread with std::launch::deferred 1 access 2 arch_prctl 27 brk 10000 clone 6 close 1 execve 1 exit_group 2 futex 10028 mmap 10009 mprotect 9998 munmap 7 newfstatat 6 openat 7 pread64 1 prlimit64 5 read 2 rt_sigaction 20001 rt_sigprocmask 1 set_robust_list 1 set_tid_address 5 write </code></pre> We observe that <code>std::async</code> is significantly faster with <code>std::launch::deferred</code> but that everything else doesn't seem to matter as much. My conclusions are: <ul> <li> The current libstdc++ implementation does not take advantage of the fact that <code>std::async</code> doesn't need a new thread for each task. </li> <li> The current libstdc++ implementation does some sort of locking in <code>std::async</code> that <code>std::thread</code> doesn't do. </li> <li> <code>std::async</code> with <code>std::launch::deferred</code> saves setup and destroy costs and is much faster for this case. </li> </ul> My machine is configured as follows: <pre class="prettyprint lang-none prettyprint-override"><code>$ uname -a Linux linux-2 5.12.1-arch1-1 #1 SMP PREEMPT Sun, 02 May 2021 12:43:58 +0000 x86_64 GNU/Linux $ g++ --version g++ (GCC) 10.2.0 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ lscpu # truncated Architecture: x86_64 Byte Order: Little Endian CPU(s): 8 Model name: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz </code></pre> Original Answer: <code>std::thread</code> is a wrapper for thread objects which are provided by the operating system, they are extremely expensive to create and destroy. <code>std::async</code> is similar, but there isn't a 1-to-1 mapping between tasks and operating system threads. This could be implemented with thread pools, where threads are reused for multiple tasks. So <code>std::async</code> is better if you have many small tasks, and <code>std::thread</code> is better if you have a few tasks that are running for long periods of time. Also if you have things that truly need to happen in parallel, then <code>std::async</code> might not fit very well. (<code>std::thread</code> also can't make such guarantees, but that's the closest you can get.) <hr> Maybe to clarify, in your case <code>std::async</code> saves the overhead from creating and destroying threads. (Depending on the operating system, you could also lose performance simply by having a lot of threads running. An operating system might have a scheduling strategy where it tries to guarantee that every thread gets executed every so often, thus the scheduler could decide go give the individual threads smaller slices of processing time, thus creating more overhead for switching between threads.)

Looks like something is not happening like expected. I compiled the whole thing on my fedora and the first results where surprising. I commented out all tests to keep only the 2 ones comparing threads and async. The output looks like confirming the behaviour: <pre class="prettyprint"><code> Thead version result size s map p map s fold p fold 10000 642 628 751 770 100000 6533 3444 7985 3338 500000 14885 5760 13854 6304 1000000 23428 11398 27795 12129 2000000 47136 22468 55518 24154 5000000 118690 55752 139489 60142 10000000 236496 112467 277413 121002 25000000 589277 276750 694742 297832 500000001.17839e+06 5553181.39065e+06 594102 Async version: size s map p1 map p2 map s fold p1 fold p2 fold 10000 248 232 231 273 282 273 100000 2323 1562 2827 2757 1536 1766 500000 12312 5615 12044 14014 6272 7431 1000000 23585 11701 24060 27851 12376 14109 2000000 47147 22796 48035 55433 25565 30095 5000000 118465 59980 119698 140775 62960 68382 10000000 241727 110883 239554 277958 121205 136041 </code></pre> Looks like the async is actually 2 times faster (for small vallues) than the threaded one. Then I used strace to count the number of <code>clone</code> system call done (number of thread created): <pre class="prettyprint"><code> 64 clone with threads 92 clone with async </code></pre> So looks like the explaination on the time spend to create thread is somehow in contradiction as he async version actually creates as many threads than the thread based one (the difference comes from the fact that there are two version of fold in the async code). Then I tried to swap both test order of execution (put async before thread), and here are the results: <pre class="prettyprint"><code> size s map p1 map p2 map s fold p1 fold p2 fold 10000 653 694 624 718 748 718 100000 6731 3931 2978 8533 3116 1724 500000 12406 5839 14589 13895 8427 7072 1000000 23813 11578 24099 27853 13091 14108 2000000 47357 22402 48197 55469 24572 33543 5000000 117923 55869 120303 139061 61801 68281 10000000 234861 111055 239124 277153 121270 136953 size s map p map s fold p fold 10000 232 232 273 328 100000 6424 3271 8297 4487 500000 21329 5547 13913 6263 1000000 23654 11419 27827 12083 2000000 47230 22763 55653 24135 5000000 117448 56785 139286 61679 10000000 235394 111021 278177 119805 25000000 589329 279637 696392 301485 500000001.1824e+06 5564431.38722e+06 606279 </code></pre> So now the "thread" version is 2 time faster than async for small values. Looking at clone calls di not show any differences: <pre class="prettyprint"><code> 92 clone 64 clone </code></pre> I didn't have much time to go further, but at least on linux, we can consider that there are no differences between the two version (the async could even be seen as less efficient as it requires more threads). We can see that it has nothing to do with the async/thread problem. Moreover, if we look at values that realy need computation time, the difference of time is really small and not relevent: 55752us vs 56785us for 5'000'000, and keeps matching for bigger values. This looks like usual problem with microbenches, we somehow measure latencies of the system more than the computation time itself. Note: the figures shown are without optimization (original code) ; adding <code>-O3</code> obviously speeds up the computations, but results show the same: there is not real difference in computation time on big values.

Why would concurrency using std::async be faster than using std::thread?

Tags:

c++

multithreading

I was reading Chapter 8 of the "Modern C++ Programming Cookbook, 2nd edition" on concurrency and stumbled upon something that puzzles me.

The author implements different versions of parallel map and reduce functions using std::thread and std::async. The implementations are really close; for example, the heart of the parallel_map functions are

// parallel_map using std::async
...
tasks.emplace_back(std::async(
  std::launch::async,
  [=, &f] {std::transform(begin, last, begin, std::forward<F>(f)); }));
...

// parallel_map using std::thread
...
threads.emplace_back([=, &f] {std::transform(begin, last, begin, std::forward<F>(f)); });
...

The complete code can be found here for std::thread and there for std::async.

What puzzles me is that the computation times reported in the book give a significant and consistent advantage to the std::async implementation. Moreover, the author acknowledge this fact as being obvious, without providing any hint of justification:

If we compare this [result with async] with the results from the parallel version using threads, we will find that these are faster execution times and that the speedup is significant, especially for the fold function.

I ran the code above on my computer, and even though the differences are not as compelling as in the book, I find that the std::async implementation is indeed faster than the std::thread one. (The author also later brings in standard implementations of these algorithms, which are even faster). On my computer, the code runs with four threads, which corresponds to the number of physical cores of my CPU.

Maybe I missed something, but why is it obvious that std::async should run faster than std::thread on this example? My intuition was that std::async being a higher-level implementation of threads, it should take at least the same amount of time, if not more, than threads -- obviously I was wrong. Are those findings consistent, as suggested by the book, and what is the explanation?

709

asked Apr 10 '21 13:04

user209974

2 Answers

My original interpretation was incorrect. Refer to @OznOg's answer below.

Modified Answer:

I created a simple benchmark that uses std::async and std::thread to do some tiny tasks:

#include <thread>
#include <chrono>
#include <vector>
#include <future>
#include <iostream>

__thread volatile int you_shall_not_optimize_this;

void work() {
    // This is the simplest way I can think of to prevent the compiler and
    // operating system from doing naughty things
    you_shall_not_optimize_this = 42;
}

[[gnu::noinline]]
std::chrono::nanoseconds benchmark_threads(size_t count) {
    std::vector<std::optional<std::thread>> threads;
    threads.resize(count);

    auto before = std::chrono::high_resolution_clock::now();

    for (size_t i = 0; i < count; ++i)
        threads[i] = std::thread { work };

    for (size_t i = 0; i < count; ++i)
        threads[i]->join();

    threads.clear();

    auto after = std::chrono::high_resolution_clock::now();

    return after - before;
}

[[gnu::noinline]]
std::chrono::nanoseconds benchmark_async(size_t count, std::launch policy) {
    std::vector<std::optional<std::future<void>>> results;
    results.resize(count);

    auto before = std::chrono::high_resolution_clock::now();

    for (size_t i = 0; i < count; ++i)
        results[i] = std::async(policy, work);

    for (size_t i = 0; i < count; ++i)
        results[i]->wait();

    results.clear();

    auto after = std::chrono::high_resolution_clock::now();

    return after - before;
}

std::ostream& operator<<(std::ostream& stream, std::launch value)
{
    if (value == std::launch::async)
        return stream << "std::launch::async";
    else if (value == std::launch::deferred)
        return stream << "std::launch::deferred";
    else
        return stream << "std::launch::unknown";
}

// #define CONFIG_THREADS true
// #define CONFIG_ITERATIONS 10000
// #define CONFIG_POLICY std::launch::async

int main() {
    std::cout << "Running benchmark:\n"
              << "  threads?     " << std::boolalpha << CONFIG_THREADS << '\n'
              << "  iterations   " << CONFIG_ITERATIONS << '\n'
              << "  async policy " << CONFIG_POLICY << std::endl;

    std::chrono::nanoseconds duration;
    if (CONFIG_THREADS) {
        duration = benchmark_threads(CONFIG_ITERATIONS);
    } else {
        duration = benchmark_async(CONFIG_ITERATIONS, CONFIG_POLICY);
    }

    std::cout << "Completed in " << duration.count() << "ns (" << std::chrono::duration_cast<std::chrono::milliseconds>(duration).count() << "ms)\n";
}

I've run the benchmark as follows:

$ g++ -Wall -Wextra -std=c++20 -pthread -O3 -DCONFIG_THREADS=false -DCONFIG_ITERATIONS=10000 -DCONFIG_POLICY=std::launch::deferred main.cpp -o main && ./main
Running benchmark:
  threads?     false
  iterations   10000
  async policy std::launch::deferred
Completed in 4783327ns (4ms)
$ g++ -Wall -Wextra -std=c++20 -pthread -O3 -DCONFIG_THREADS=false -DCONFIG_ITERATIONS=10000 -DCONFIG_POLICY=std::launch::async main.cpp -o main && ./main
Running benchmark:
  threads?     false
  iterations   10000
  async policy std::launch::async
Completed in 301756775ns (301ms)
$ g++ -Wall -Wextra -std=c++20 -pthread -O3 -DCONFIG_THREADS=true -DCONFIG_ITERATIONS=10000 -DCONFIG_POLICY=std::launch::deferred main.cpp -o main && ./main
Running benchmark:
  threads?     true
  iterations   10000
  async policy std::launch::deferred
Completed in 291284997ns (291ms)
$ g++ -Wall -Wextra -std=c++20 -pthread -O3 -DCONFIG_THREADS=true -DCONFIG_ITERATIONS=10000 -DCONFIG_POLICY=std::launch::async main.cpp -o main && ./main
Running benchmark:
  threads?     true
  iterations   10000
  async policy std::launch::async
Completed in 293539858ns (293ms)

I re-ran all the benchmarks with strace attached and accumulated the system calls made:

# std::async with std::launch::async
      1 access
      2 arch_prctl
     36 brk
  10000 clone
      6 close
      1 execve
      1 exit_group
  10002 futex
  10028 mmap
  10009 mprotect
   9998 munmap
      7 newfstatat
      6 openat
      7 pread64
      1 prlimit64
      5 read
      2 rt_sigaction
  20001 rt_sigprocmask
      1 set_robust_list
      1 set_tid_address
      5 write

# std::async with std::launch::deferred
      1 access
      2 arch_prctl
     11 brk
      6 close
      1 execve
      1 exit_group
  10002 futex
     28 mmap
      9 mprotect
      2 munmap
      7 newfstatat
      6 openat
      7 pread64
      1 prlimit64
      5 read
      2 rt_sigaction
      1 rt_sigprocmask
      1 set_robust_list
      1 set_tid_address
      5 write

# std::thread with std::launch::async
      1 access
      2 arch_prctl
     27 brk
  10000 clone
      6 close
      1 execve
      1 exit_group
      2 futex
  10028 mmap
  10009 mprotect
   9998 munmap
      7 newfstatat
      6 openat
      7 pread64
      1 prlimit64
      5 read
      2 rt_sigaction
  20001 rt_sigprocmask
      1 set_robust_list
      1 set_tid_address
      5 write

# std::thread with std::launch::deferred
      1 access
      2 arch_prctl
     27 brk
  10000 clone
      6 close
      1 execve
      1 exit_group
      2 futex
  10028 mmap
  10009 mprotect
   9998 munmap
      7 newfstatat
      6 openat
      7 pread64
      1 prlimit64
      5 read
      2 rt_sigaction
  20001 rt_sigprocmask
      1 set_robust_list
      1 set_tid_address
      5 write

We observe that std::async is significantly faster with std::launch::deferred but that everything else doesn't seem to matter as much.

My conclusions are:

The current libstdc++ implementation does not take advantage of the fact that std::async doesn't need a new thread for each task.
The current libstdc++ implementation does some sort of locking in std::async that std::thread doesn't do.
std::async with std::launch::deferred saves setup and destroy costs and is much faster for this case.

My machine is configured as follows:

$ uname -a
Linux linux-2 5.12.1-arch1-1 #1 SMP PREEMPT Sun, 02 May 2021 12:43:58 +0000 x86_64 GNU/Linux
$ g++ --version
g++ (GCC) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ lscpu # truncated
Architecture:                    x86_64
Byte Order:                      Little Endian
CPU(s):                          8
Model name:                      Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz

Original Answer:

std::thread is a wrapper for thread objects which are provided by the operating system, they are extremely expensive to create and destroy.

std::async is similar, but there isn't a 1-to-1 mapping between tasks and operating system threads. This could be implemented with thread pools, where threads are reused for multiple tasks.

So std::async is better if you have many small tasks, and std::thread is better if you have a few tasks that are running for long periods of time.

Also if you have things that truly need to happen in parallel, then std::async might not fit very well. (std::thread also can't make such guarantees, but that's the closest you can get.)

Maybe to clarify, in your case std::async saves the overhead from creating and destroying threads.

(Depending on the operating system, you could also lose performance simply by having a lot of threads running. An operating system might have a scheduling strategy where it tries to guarantee that every thread gets executed every so often, thus the scheduler could decide go give the individual threads smaller slices of processing time, thus creating more overhead for switching between threads.)

196

answered Oct 17 '22 09:10

asynts

Looks like something is not happening like expected. I compiled the whole thing on my fedora and the first results where surprising. I commented out all tests to keep only the 2 ones comparing threads and async. The output looks like confirming the behaviour:

 Thead version result

    size   s map   p map  s fold  p fold
   10000     642     628     751     770
  100000    6533    3444    7985    3338
  500000   14885    5760   13854    6304
 1000000   23428   11398   27795   12129
 2000000   47136   22468   55518   24154
 5000000  118690   55752  139489   60142
10000000  236496  112467  277413  121002
25000000  589277  276750  694742  297832
500000001.17839e+06  5553181.39065e+06  594102

Async version:
    size   s map  p1 map  p2 map  s fold p1 fold p2 fold
   10000     248     232     231     273     282     273
  100000    2323    1562    2827    2757    1536    1766
  500000   12312    5615   12044   14014    6272    7431
 1000000   23585   11701   24060   27851   12376   14109
 2000000   47147   22796   48035   55433   25565   30095
 5000000  118465   59980  119698  140775   62960   68382
10000000  241727  110883  239554  277958  121205  136041

Looks like the async is actually 2 times faster (for small vallues) than the threaded one. Then I used strace to count the number of clone system call done (number of thread created):

 64 clone with threads
 92 clone with async

So looks like the explaination on the time spend to create thread is somehow in contradiction as he async version actually creates as many threads than the thread based one (the difference comes from the fact that there are two version of fold in the async code).

Then I tried to swap both test order of execution (put async before thread), and here are the results:

    size   s map  p1 map  p2 map  s fold p1 fold p2 fold
   10000     653     694     624     718     748     718
  100000    6731    3931    2978    8533    3116    1724
  500000   12406    5839   14589   13895    8427    7072
 1000000   23813   11578   24099   27853   13091   14108
 2000000   47357   22402   48197   55469   24572   33543
 5000000  117923   55869  120303  139061   61801   68281
10000000  234861  111055  239124  277153  121270  136953
    size   s map   p map  s fold  p fold
   10000     232     232     273     328
  100000    6424    3271    8297    4487
  500000   21329    5547   13913    6263
 1000000   23654   11419   27827   12083
 2000000   47230   22763   55653   24135
 5000000  117448   56785  139286   61679
10000000  235394  111021  278177  119805
25000000  589329  279637  696392  301485
500000001.1824e+06  5564431.38722e+06  606279

So now the "thread" version is 2 time faster than async for small values.

Looking at clone calls di not show any differences:

 92 clone
 64 clone

I didn't have much time to go further, but at least on linux, we can consider that there are no differences between the two version (the async could even be seen as less efficient as it requires more threads).

We can see that it has nothing to do with the async/thread problem.

Moreover, if we look at values that realy need computation time, the difference of time is really small and not relevent: 55752us vs 56785us for 5'000'000, and keeps matching for bigger values.

This looks like usual problem with microbenches, we somehow measure latencies of the system more than the computation time itself.

Note: the figures shown are without optimization (original code) ; adding -O3 obviously speeds up the computations, but results show the same: there is not real difference in computation time on big values.

answered Oct 17 '22 07:10

OznOg

Related questions
                            
                                How to specify mnemonics (ampersand shortcut) to QActions in a QMenu?
                            
                                rvalue-reference to array: can it actually happen?
                            
                                Compiling clang-llvm examples
                            
                                Reconstruction a 3D point from multiple 2D points?
                            
                                warning: conversion to 'double' from 'long int' may alter its value
                            
                                C++ Ruby Extension with External Libraries
                            
                                Pre-allocating memory in MATLAB à la std::vector::reserve(n)
                            
                                Is boost::interprocess threadsafe?
                            
                                Anti-hacking a game - best practices, suggestions
                            
                                LabVIEW, C++ DLL, and IMAQ Images
                            
                                Failed writing body in libcurl
                            
                                Why is it not using my overloaded operator for ++?
                            
                                How to instruct the compiler to generate an alias for a virtual function?
                            
                                Passing a 2d dynamic array to a function in C++
                            
                                Can't open .rc files in Visual Studio for editing, app compiles fine
                            
                                Does an empty value pack expansion match a type pack or optional type parameter?
                            
                                what's the best way to write a pluginable application?
                            
                                Why is the size of this struct 24?
                            
                                Initialize large two dimensional array in C++
                            
                                Why does const have to be added to constexpr for a string literal declaration?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With