Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

On Linux GCC/pthread parallel code is much slower than simple single thread code

I am testing pthread parallel code on Linux with gcc (GCC) 4.8.3 20140911, on a CentOS 7 Server.

The single thread version is simple, it is used to init a 10000 * 10000 matrix :

int main(int argc)
{
    int size = 10000;

    int * r = (int*)malloc(size * size * sizeof(int));
    for (int i=0; i<size; i++) {
            for (int j=0; j<size; j++) {
                r[i * size + j] = rand();
            }
    }
    free(r);
}

Then I wanted to see if parallel code can improve the performance:

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

int size = 10000;

void *SetOdd(void *param) 
{
   printf("Enter odd\n"); 
   int * r      = (int*)param;
   for (int i=0; i<size; i+=2) {
         for (int j=0; j<size; j++) {
                r[i * size + j] = rand();
         }
   }
   printf("Exit Odd\n");
   pthread_exit(NULL);
   return 0;
} 

void *SetEven(void *param) 
{ 
   printf("Enter Even\n");
   int * r      = (int*)param;
   for (int i=1; i<size; i+=2) {
        for (int j=0; j<size; j++) {
                r[i * size + j] = rand();
        }
   }
   printf("Exit Even\n");
   pthread_exit(NULL);
   return 0;
} 

int main(int argc)
{
     printf("running in thread\n");
     pthread_t threads[2];
     int * r = (int*)malloc(size * size * sizeof(int));
     int rc0 = pthread_create(&threads[0], NULL, SetOdd, (void *)r); 
     int rc1 = pthread_create(&threads[1], NULL, SetEven, (void *)r); 
     for(int t=0; t<2; t++) {
           void* status;
           int rc = pthread_join(threads[t], &status);
           if (rc)  {
               printf("ERROR; return code from pthread_join()   is %d\n", rc);
               exit(-1);
            }
            printf("Completed join with thread %d status= %ld\n",t,      (long)status);
        }

   free(r);
   return 0;
}

The simple code runs for about 0.8 second, while the multiple threaded version runs for about 10 seconds!!!!!!!

I am running on a 4 core server. But why the multiple threaded version is so slow ?

like image 246
LFF Avatar asked Feb 06 '26 10:02

LFF


1 Answers

rand() is neither thread-safe nor re-entrant. So you can't use rand() in multi-threaded applications.

Use rand_r() instead which is also a pseudo-random generator and is thread-safe. If you care about. Using rand_r() results in shorter execution time for your code on my system with 2 cores (roughly half the time as the single threaded version).

In both of your threads functions, do:

void *SetOdd(void *param)
{
   printf("Enter odd\n");
   unsigned int s = (unsigned int)time(0);

   int * r      = (int*)param;
   for (int i=0; i<size; i+=2) {
         for (int j=0; j<size; j++) {
                r[i * size + j] = rand_r(&s);
         }
   }
   printf("Exit Odd\n");
   pthread_exit(NULL);
   return 0;
}

Update:

While C and POSIX standards do mandate rand() to be a thread-safe function, the glibc implementation (used on Linux) actually does implement it in a thread-safe manner.

If we look at the glibc implementation of the rand(), there's a lock:

 291   __libc_lock_lock (lock);
 292 
 293   (void) __random_r (&unsafe_state, &retval);
 294 
 295   __libc_lock_unlock (lock);
 296 

Any synchronization construct (mutex, conditional variable etc) is bad for performance i.e. the least number of such constructs used in the code the better it is for performance (of course, we can't avoid certain them completely in multi-threaded applications).

So only one thread can actually access the random number generator as both threads are fighting for the lock all the time. This explains why rand() leads to poor performance in multi-threaded code.

like image 130
P.P Avatar answered Feb 09 '26 01:02

P.P