Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow thread creation on Windows

I have upgraded a number crunching application to a multi-threaded program, using the C++11 facilities. It works well on Mac OS X but does not benefit from multithreading on Windows (Visual Studio 2013). Using the following toy program

#include <iostream>
#include <thread>

void t1(int& k) {
    k += 1;
};

void t2(int& k) {
    k += 1;
};

int main(int argc, const char *argv[])
{
    int a{ 0 };
    int b{ 0 };

    auto start_time = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < 10000; ++i) {
        std::thread thread1{ t1, std::ref(a) };
        std::thread thread2{ t2, std::ref(b) };
        thread1.join();
        thread2.join();
    }
    auto end_time = std::chrono::high_resolution_clock::now();
    auto time_stack = std::chrono::duration_cast<std::chrono::microseconds>(
        end_time - start_time).count();
    std::cout << "Time: " << time_stack / 10000.0 << " micro seconds" <<
        std::endl;

    std::cout << a << " " << b << std::endl;

    return 0;
}

I have discovered that it takes 34 microseconds to start a thread on Mac OS X and 340 microseconds to do the same on Windows. Am I doing something wrong on the Windows side ? Is it a compiler issue ?

like image 218
InsideLoop Avatar asked Dec 20 '22 09:12

InsideLoop


2 Answers

Not a compiler problem (nor an operating system problem, strictly speaking).

It is a well-known fact that creating threads is an expensive operation. This is especially true under Windows (used to be true under Linux prior to clone as well).
Also, creating and joining a thread is necessarily slow and does not tell a lot about creating a thread as such. Joining presumes that the thread has exited, which can only happen after it has been scheduled to run. Thus, your measurements include delays introduced by scheduling. Insofar, the times you measure are actually pretty good (they could easily be 20 times longer!).

However, it does not matter a lot whether spawning threads is slow anyway.

Creating 20,000 threads like in your benchmark in a real program is a serious error. While it is not strictly illegal or disallowed to create thousands (even millions) of threads, the "correct" way of using threads is to create no more threads than there are approximately CPU cores. One does not create very short-lived threads all the time either.
You might have a few short-lived ones, and you might create a few extra threads (which e.g. block on I/O), but you will not want to create hundreds or thousands of these. Every additional thread (beyond the number of CPU cores) means more context switches, more scheduler work, more cache pressure, and 1MB of address space and 64kB of physical memory gone per thread (due to stack reserve and commit granularity).

Now, assume you create for example 10 threads at program start, it does not matter at all whether this takes 3 milliseconds alltogether. It takes several hundred milliseconds (at least) for the program to start up anyway, nobody will notice a difference.

like image 138
Damon Avatar answered Dec 30 '22 22:12

Damon


Visual C++ uses Concurrency Runtime (MS specific) to implement std.thread features. When you directly call any Concurrency Runtime feature/function, it creates a default runtime object (not going into details). Or, when you call std.thread function, it does the same as of ConcRT function was invoked.

The creation of default runtime (or say, scheduler) takes sometime, and hence it appear to be taking sometime. Try creating a std::thread object, let it run; and then execute the benching marking code (whole of above code, for example).

EDIT:

  • Skim over it - http://www.codeproject.com/Articles/80825/Concurrency-Runtime-in-Visual-C
  • Do Step-Into debugging, to see when CR library is invoked, and what it is doing.
like image 38
Ajay Avatar answered Dec 30 '22 21:12

Ajay