Serial code much slower than using only one thread in C?

Tags:

So, I was doing some benchmark tests with threads, and i wrote these pieces of code:

resp_threadless[] and resp_threaded[] are global int arrays and their size is n;

int n = 100000;

void function() {
  for (long j = 0; j < n; ++j) {
    int count = 0;
    double x = vetor[j];
      while (x > 1.0) {
      x = sqrt(x);
      ++count;
    }
   resp_threadless[j] = count;
  }
}

DWORD WINAPI function_th( LPVOID lpParam ) {
for (long j = 0; j < n; ++j) {
    int count = 0;
    double x = vetor[j];
      while (x > 1.0) {
      x = sqrt(x);
      ++count;
    }
   resp_threadless[j] = count;
  }
}

I benchmarked the first function by just calling her:

function();

And the second one like this:

HANDLE hThreadArray[1];
DWORD dwThreads[1];
hThreadArray[0] = CreateThread(NULL, 0, function_th, NULL , 0, &(dwThreads[0]));
WaitForMultipleObjects(1, hThreadArray, TRUE, INFINITE);
CloseHandle(hThreadArray[0]);

Keep in mind that I know that calling multiple threads using function_th() will not parallelize it, this is just a test because i was having really strange results, so I decided to see what would happen with one thread and one function using the SAME code.

I tested this in a Intel Atom N270, and windows XP with NUMPROC = 1.

Results: Serial code: 1485 ms One Thread: 425 ms

I've had similar results using multiprocessor machines, and even with code using semaphores to parallelize the work done by the threads.

Does anyone has any idea of what could be happening?

EDIT

Inverting the order, running multiple times each one, etc... -> No change

Higher N -> Thread one is proportionally even faster

Using QueryPerformanceCounter() -> No change

Thread Creation Overhead -> Should make the threaded even one slower, not faster

Original code: http://pastebin.com/tgmp5p1G

717

asked Oct 21 '12 21:10

ruback

1 Answers

It's a cache hit matter. I suspect you did the benchmark in the order you described it in your question. The function was called first and the thread was called after. When you benchmark this in more detail, you will observe the reason: Data (sqrt) is availabel in cache, thus the code will execute much faster. Test to proove:

Run the function() twice or even more often before calling the thread. The second call to function will give the quicker result already.
Call the thread before the function and your result will show the opposite. The function will show the better result.

Reason: All of the sqrt calculation (or at least lots of them) are available in cache and don't have to be recalculated. That's a lot faster.

125

answered Oct 20 '22 00:10

Arno

Related questions
                            
                                What is required printf precision for a __float128 to not lose information?
                            
                                Converting Ada String to C Void*
                            
                                How does one unit test handling of the error conditions for Python/C APIs like PyType_Ready and PyObject_New?
                            
                                Terminology when Initializing C Structures
                            
                                possible to revive a corefile back into a running program?
                            
                                Position independent code, shared libraries and code veneers - getting them to work together
                            
                                How does GDB determine the bottom of the stack?
                            
                                Retrieving a complete list of Windows processes in C
                            
                                how can I get maximum bandwidth of an interface in linux C?
                            
                                AWS lib3s C/C++ Library Sample Code / Tutorials
                            
                                Recording sound using ALSA from Line IN
                            
                                How to connect client/server in C (Beej's Guide to Network Programming)
                            
                                how to define macro to convert concatenated char string to wchar_t string in C
                            
                                pthreads: programmatically gather information on time spent on different states?
                            
                                HDF5 C Code generation
                            
                                FFmpeg av_read_frame and maximum packet size
                            
                                Linking to multiple libraries, one of which wraps a set of system calls
                            
                                Can memset be parallelized on 4 cores?
                            
                                Linking Cocoa headers to ruby C extension
                            
                                Xcode & WebView: Load local html if no internet connection (fallback)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Serial code much slower than using only one thread in C?

Tags:

c

multithreading

parallel-processing

ruback

People also ask

1 Answers

Arno

Recent Activity

Donate For Us