On Linux GCC/pthread parallel code is much slower than simple single thread code

Question

I am testing pthread parallel code on Linux with gcc (GCC) 4.8.3 20140911, on a CentOS 7 Server.

The single thread version is simple, it is used to init a 10000 * 10000 matrix :

int main(int argc)
{
    int size = 10000;

    int * r = (int*)malloc(size * size * sizeof(int));
    for (int i=0; i<size; i++) {
            for (int j=0; j<size; j++) {
                r[i * size + j] = rand();
            }
    }
    free(r);
}

Then I wanted to see if parallel code can improve the performance:

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

int size = 10000;

void *SetOdd(void *param) 
{
   printf("Enter odd
"); 
   int * r      = (int*)param;
   for (int i=0; i<size; i+=2) {
         for (int j=0; j<size; j++) {
                r[i * size + j] = rand();
         }
   }
   printf("Exit Odd
");
   pthread_exit(NULL);
   return 0;
} 

void *SetEven(void *param) 
{ 
   printf("Enter Even
");
   int * r      = (int*)param;
   for (int i=1; i<size; i+=2) {
        for (int j=0; j<size; j++) {
                r[i * size + j] = rand();
        }
   }
   printf("Exit Even
");
   pthread_exit(NULL);
   return 0;
} 

int main(int argc)
{
     printf("running in thread
");
     pthread_t threads[2];
     int * r = (int*)malloc(size * size * sizeof(int));
     int rc0 = pthread_create(&threads[0], NULL, SetOdd, (void *)r); 
     int rc1 = pthread_create(&threads[1], NULL, SetEven, (void *)r); 
     for(int t=0; t<2; t++) {
           void* status;
           int rc = pthread_join(threads[t], &status);
           if (rc)  {
               printf("ERROR; return code from pthread_join()   is %d
", rc);
               exit(-1);
            }
            printf("Completed join with thread %d status= %ld
",t,      (long)status);
        }

   free(r);
   return 0;
}

The simple code runs for about 0.8 second, while the multiple threaded version runs for about 10 seconds!!!!!!!

I am running on a 4 core server. But why the multiple threaded version is so slow ?

P.P · Accepted Answer

rand() is neither thread-safe nor re-entrant. So you can't use rand() in multi-threaded applications.

Use rand_r() instead which is also a pseudo-random generator and is thread-safe. If you care about. Using rand_r() results in shorter execution time for your code on my system with 2 cores (roughly half the time as the single threaded version).

In both of your threads functions, do:

void *SetOdd(void *param)
{
   printf("Enter odd
");
   unsigned int s = (unsigned int)time(0);

   int * r      = (int*)param;
   for (int i=0; i<size; i+=2) {
         for (int j=0; j<size; j++) {
                r[i * size + j] = rand_r(&s);
         }
   }
   printf("Exit Odd
");
   pthread_exit(NULL);
   return 0;
}

Update:

While C and POSIX standards do mandate rand() to be a thread-safe function, the glibc implementation (used on Linux) actually does implement it in a thread-safe manner.

If we look at the glibc implementation of the rand(), there's a lock:

 291   __libc_lock_lock (lock);
 292 
 293   (void) __random_r (&unsafe_state, &retval);
 294 
 295   __libc_lock_unlock (lock);
 296

Any synchronization construct (mutex, conditional variable etc) is bad for performance i.e. the least number of such constructs used in the code the better it is for performance (of course, we can't avoid certain them completely in multi-threaded applications).

So only one thread can actually access the random number generator as both threads are fighting for the lock all the time. This explains why rand() leads to poor performance in multi-threaded code.

On Linux GCC/pthread parallel code is much slower than simple single thread code

Tags:

performance

c

gcc

pthreads

LFF

1 Answers

P.P

Recent Activity

Donate For Us

On Linux GCC/pthread parallel code is much slower than simple single thread code

Tags:

performance

c

gcc

pthreads

LFF

1 Answers

P.P

Related questions

Recent Activity

Donate For Us