Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is a single thread faster than multiple threads even though they essentially have the same overhead?

I am running a 64bit Windows 7 on an 8 core processor. I ran the following:

    #include "stdafx.h"
    #include <iostream>
    #include <Windows.h>
    #include <process.h>
    #include <ctime>

    using namespace std;

    int count = 0;
    int t = time(NULL);

    //poop() loops incrementing count until it is 300 million.
    void poop(void* params) {
        while(count < 300000000) {
            count++;
        }


        cout<< time(NULL) - t <<" \n";
    }

    int _tmain(int argc, _TCHAR* argv[])
    {
        //_beginthread(poop, 0, NULL);      
        //_beginthread(poop, 0, NULL);
        poop(NULL);

        cout<<"done"<<endl;

        while(1);

        return 0;
    }

I compared the result to when I uncommented the beginThread's. It turns out the single thread version accomplishes this the fastest! Actually, adding more threads makes the process take even longer. Making the count 300 million made the process take 8+ seconds, which I figured was good enough to rule out the function calls for beginThread + other minor overhead.

I did a little research and the general conclusion for a multithread process to be slower is the overhead. But in this case, whether I run multiple threads or a single one, the number of times the variable count (which exists in the data segment because it's a preallocated variable afaik) is accessed is equal. So basically, the overhead (if it is an overhead issue) is not coming from the fact that it costs more to access a global variable than a local variable.

Looking at my task manager, the process with a single thread is using 13% cpu (about 1/8 cores) and adding threads increases the cpu usage in increments of 1/8 approx. So in terms of cpu power, assuming task manager is accurately depicting this, adding threads uses more cpu. Which further confuses me.. how is it that I'm using more overall cpu, with separate cores, but overall taking longer to accomplish the task?

TLDR: WHY DOES THIS HAPPEN

like image 659
lululoo Avatar asked Mar 28 '13 03:03

lululoo


People also ask

Why single thread is faster than multithreaded?

A switch between threads is faster because no switching between address spaces occurs. Communication between the threads of one process is simple because the threads share everything, most importantly address space. So, data produced by one thread is immediately available to all the other threads in the process.

Which is faster single thread or multi thread?

Multi threading may improve throughput of the application by using more CPU power. it depends on a lot of factors. If not, the performance depends on above factors and throughput will vary between single threaded application and multi-threading application.

In what situations does a single thread have more advantages than multi threads?

So when processing a task in a thread is trivial, the cost of creating a thread will create more overhead than distributing the task. This is one case where a single thread will be faster than multithreading.

Why using more threads makes it slower than using less threads?

It is pretty straightforward and simple to understand. Having more threads than what your CPU supports you are actually serializing and not parallelizing. The more threads you have the slower your system will be. Your results is actually a proof of this phenomenon.


2 Answers

Your code is inherently wrong.

count++ is a three-step operation that reads the value, increments it, then stores it back into the variable.
If two threads run count++ at once on the same variable, one of them will overwrite the other one's changes.

Therefore, the multi-threaded version will end up doing extra work as each thread clobbers the other threads' progress.

If you make count a local variable, the timing should look much more normal.

Alternatively, you could use an interlocked increment, which is thread-safe, but has extra overhead to synchronize across threads.

like image 168
SLaks Avatar answered Oct 11 '22 04:10

SLaks


As some of the commenters on your original question have pointed out you have a correctness and performance issue. Firstly all your threads are accessing count concurrently. This means that there is no guarentee that the threads will actually all count to 300 million. You can solve this correctness bug by declaring count within your poop function

void poop(void* params) {
    int count  = 0;
    while(count < 300000000) {
        count++;
    }
    cout<< time(NULL) - t <<" \n";
}

Note that this is not a problem for t because it is only read, not written, by the threads. However it is a problem with cout as you're also writing to that from multiple threads.

In addition, as pointed out in the comments, all your threads are accessing a single memory location. This means that when a thread updates count the cache line that holds it must be flushed and reloaded. This is very inefficient memory access. Typically this happens when you are accessing consecutive elements in an array, rather than a single variable (bad idea, see above). The solution to this would be to pad your array to ensure that each entry is an exact multiple of your L1 cache line size, this is obviously somewhat specific to your target processor. Another option would be to restructure your algorithm so that either; each thread processed a large block of consecutive elements, or each thread access elements in a such a way that thread do not access adjacent locations.

As you're using Windows you might want to consider using a higher level of abstraction for your code, rather than the Win32 threads functions. The Parallel Patterns Library fits the bill here (as does Intel's Threaded Building Blocks).

    concurrency::parallel_invoke(
        [=] { poop(nullptr); },
        [=] { poop(nullptr); }
    );

This allows the PPL to schedule your tasks on a thread pool, rather than your application having to explicitly create threads.

You might also consider that for really small tasks the overhead of starting additional threads may outweigh the gains of running in parallel.

like image 37
Ade Miller Avatar answered Oct 11 '22 04:10

Ade Miller