So, I was doing some benchmark tests with threads, and i wrote these pieces of code:
resp_threadless[] and resp_threaded[] are global int arrays and their size is n;
int n = 100000;
void function() {
for (long j = 0; j < n; ++j) {
int count = 0;
double x = vetor[j];
while (x > 1.0) {
x = sqrt(x);
++count;
}
resp_threadless[j] = count;
}
}
DWORD WINAPI function_th( LPVOID lpParam ) {
for (long j = 0; j < n; ++j) {
int count = 0;
double x = vetor[j];
while (x > 1.0) {
x = sqrt(x);
++count;
}
resp_threadless[j] = count;
}
}
I benchmarked the first function by just calling her:
function();
And the second one like this:
HANDLE hThreadArray[1];
DWORD dwThreads[1];
hThreadArray[0] = CreateThread(NULL, 0, function_th, NULL , 0, &(dwThreads[0]));
WaitForMultipleObjects(1, hThreadArray, TRUE, INFINITE);
CloseHandle(hThreadArray[0]);
Keep in mind that I know that calling multiple threads using function_th() will not parallelize it, this is just a test because i was having really strange results, so I decided to see what would happen with one thread and one function using the SAME code.
I tested this in a Intel Atom N270, and windows XP with NUMPROC = 1.
Results: Serial code: 1485 ms One Thread: 425 ms
I've had similar results using multiprocessor machines, and even with code using semaphores to parallelize the work done by the threads.
Does anyone has any idea of what could be happening?
EDIT
Inverting the order, running multiple times each one, etc... -> No change
Higher N -> Thread one is proportionally even faster
Using QueryPerformanceCounter() -> No change
Thread Creation Overhead -> Should make the threaded even one slower, not faster
Original code: http://pastebin.com/tgmp5p1G
Every thread needs some overhead and system resources, so it also slows down performance. Another problem is the so called "thread explosion" when MORE thread are created than cores are on the system. And some waiting threads for the end of other threads is the worst idea for multi threading.
On a single core CPU, a single process (no separate threads) is usually faster than any threading done. Threads do not magically make your CPU go any faster, it just means extra work.
It's a cache hit
matter. I suspect you did the benchmark in the order you described it in your question. The function was called first and the thread was called after. When you benchmark this in more detail, you will observe the reason: Data (sqrt) is availabel in cache, thus the code will execute much faster.
Test to proove:
function()
twice or even more often before calling the thread.
The second call to function will give the quicker result already.Reason: All of the sqrt calculation (or at least lots of them) are available in cache and don't have to be recalculated. That's a lot faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With