Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gain from threading much less than expected - why?

I have a function evaluation which is somewhat slow. I'm trying to speed it up by using threading, since there are three things which can be done in parallel. The single-threaded version is

return dEdx_short(E) + dEdx_long(E) + dEdx_quantum(E);

where evaluation of those functions takes ~250us, ~250us, and ~100us respectively. So I implemented a three-thread solution:

double ret_short, ret_long, ret_quantum; // return values for the terms

auto shortF = [this,&E,&ret_short] () {ret_short = this->dEdx_short(E);};
std::thread t1(shortF);
auto longF = [this,&E,&ret_long] () {ret_long = this->dEdx_long(E);};
std::thread t2(longF);
auto quantumF = [this,&E,&ret_quantum] () {ret_quantum = this->dEdx_quantum(E);};
std::thread t3(quantumF);

t1.join();
t2.join();
t3.join();

return ret_short + ret_long + ret_quantum;

Which I expected to take ~300us, yet it actually takes ~600us - basically the same as the single-threaded version! These are all inherently thread-safe so there are no waits for locks. I checked the thread creation time on my system and it's ~25us. I'm not using all of my cores, so I'm a bit baffled as to why the parallel solution is so slow. Is it something to do with the lambda creation?

I tried to bypass the lambda, e.g.:

std::thread t1(&StopPow_BPS::dEdx_short, this, E, ret_short);

after rewriting the function being called, but that gave me an error attempt to use a deleted function...

like image 615
Alex Z Avatar asked Apr 08 '14 16:04

Alex Z


People also ask

What is the point of threading?

Threading allows for a more defined and precise shape and can create better definition for eyebrows. It is also used as a method of removing unwanted hair on the entire face and upper lip area.

Why is multithreading slower than single thread?

Every thread needs some overhead and system resources, so it also slows down performance. Another problem is the so called "thread explosion" when MORE thread are created than cores are on the system. And some waiting threads for the end of other threads is the worst idea for multi threading.

How many threads should I use?

General rule of thumb for threading an application: 1 thread per CPU Core. On a quad core PC that means 4. As was noted, the XBox 360 however has 3 cores but 2 hardware threads each, so 6 threads in this case.

How many threads can run at once?

A single CPU core can have up-to 2 threads per core. For example, if a CPU is dual core (i.e., 2 cores) it will have 4 threads.


1 Answers

Perhaps you are experiencing false sharing. To verify, store the return values in a type that uses an entire cache line (size depends on CPU).

const int cacheLineSize = 64; // bytes
union CacheFriendly
{
    double value;
    char dummy[cacheLineSize];
} ret_short, ret_long, ret_quantum; // return values for the terms
// ...
like image 92
D Drmmr Avatar answered Oct 29 '22 14:10

D Drmmr