Python-like multiprocessing in C++

I am new to C++, and I am coming from a long background of Python.

I am searching for a way to run a function in parallel in C++. I read a lot about std::async, but it is still not very clear for me.

  1. The following code does some really interesting thing

    #include <future>
    #include <iostream>
    void called_from_async() {
      std::cout << "Async call" << std::endl;
    int main() {
      //called_from_async launched in a separate thread if possible
      std::future<void> result( std::async(called_from_async));
      std::cout << "Message from main." << std::endl;
      //ensure that called_from_async is launched synchronously
      //if it wasn't already launched
      return 0;

    If I run it several times sometimes the output is what I expected:

    Message from main.
    Async call

    But sometimes I get something like this:

    MAessysnacg ec aflrlom main.

    Why isnt the cout happens first? I clearly call the .get() method AFTER the cout.

  2. About the parallel runs. In case I have a code like this:

    #include <future>
    #include <iostream>
    #include <vector>
    int twice(int m) {
      return 2 * m;
    int main() {
      std::vector<std::future<int>> futures;
      for(int i = 0; i < 10; ++i) {
        futures.push_back (std::async(twice, i));
      //retrive and print the value stored in the future
      for(auto &e : futures) {
        std::cout << e.get() << std::endl;
      return 0;

    All the 10 calls to twice function will run on separate cores simultaneously?

    In case not, is there a similar thing in C++ like the Python multiprocess lib?

    Mainly what I am searching for:

    I write a function, and call it with n number of inputs with ?multiprocessing? and it will run the function 1 times on n nodes at the same time.

1) result.get(); does not start the thread. It only waits for the result. The parallel thread is launched with std::async(called_from_async) call (or whenever the compiler decides).

However std::cout is guaranteed to be internally thread safe. So the result you are showing us should not ever happen. There's a race condition, but you can't mix both outputs like that. If it really happens (which I doubt) then you might be dealing with a compiler bug.

2) Your calls will run parallely. On how many cores it depends on OS and other processes running on your machine. But there's a good chance that all will be used (assuming that you have control over whole ecosystem and no other cpu-intensive processes are running in the background).

There is no multiprocessing-like lib for C++ (at least not in the std). If you wish to run subprocesses then there are several options, e.g. forking or popen syscalls.

