Matrix Multiplication with Threads: Why is it not faster?

Tags:

So I've been playing around with pthreads, specifically trying to calculate the product of two matrices. My code is extremely messy because it was just supposed to be a quick little fun project for myself, but the thread theory I used was very similar to:

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

#define M 3
#define K 2
#define N 3
#define NUM_THREADS 10

int A [M][K] = { {1,4}, {2,5}, {3,6} };
int B [K][N] = { {8,7,6}, {5,4,3} };
int C [M][N];

struct v {
   int i; /* row */
   int j; /* column */
};

void *runner(void *param); /* the thread */

int main(int argc, char *argv[]) {

   int i,j, count = 0;
   for(i = 0; i < M; i++) {
      for(j = 0; j < N; j++) {
         //Assign a row and column for each thread
         struct v *data = (struct v *) malloc(sizeof(struct v));
         data->i = i;
         data->j = j;
         /* Now create the thread passing it data as a parameter */
         pthread_t tid;       //Thread ID
         pthread_attr_t attr; //Set of thread attributes
         //Get the default attributes
         pthread_attr_init(&attr);
         //Create the thread
         pthread_create(&tid,&attr,runner,data);
         //Make sure the parent waits for all thread to complete
         pthread_join(tid, NULL);
         count++;
      }
   }

   //Print out the resulting matrix
   for(i = 0; i < M; i++) {
      for(j = 0; j < N; j++) {
         printf("%d ", C[i][j]);
     }
      printf("\n");
   }
}

//The thread will begin control in this function
void *runner(void *param) {
   struct v *data = param; // the structure that holds our data
   int n, sum = 0; //the counter and sum

   //Row multiplied by column
   for(n = 0; n< K; n++){
      sum += A[data->i][n] * B[n][data->j];
   }
   //assign the sum to its coordinate
   C[data->i][data->j] = sum;

   //Exit the thread
   pthread_exit(0);
}

source: http://macboypro.com/blog/2009/06/29/matrix-multiplication-in-c-using-pthreads-on-linux/

For the non-threaded version, I used the same setup (3 2-d matrices, dynamically allocated structs to hold r/c), and added a timer. First trials indicated that the non-threaded version was faster. My first thought was that the dimensions were too small to notice a difference, and it was taking longer to create the threads. So I upped the dimensions to about 50x50, randomly filled, and ran it, and I'm still not seeing any performance upgrade with the threaded version.

What am I missing here?

843

asked Jun 06 '10 22:06

prelic

1 Answers

Unless you're working with very large matrices (many thousands of rows/columns), then you are unlikely to see much improvement from this approach. Setting up a thread on a modern CPU/OS is actually pretty expensive in relative terms of CPU time, much more time than a few multiply operations.

Also, it's usually not worthwhile to set up more than one thread per CPU core that you have available. If you have, say, only two cores and you set up 2500 threads (for 50x50 matrices), then the OS is going to spend all its time managing and switching between those 2500 threads rather than doing your calculations.

If you were to set up two threads beforehand (still assuming a two-core CPU), keep those threads available all the time waiting for work to do, and supply them with the 2500 dot products you need to calculate in some kind of synchronised work queue, then you might start to see an improvement. However, it still won't ever be more than 50% better than using only one core.

118

answered Sep 28 '22 20:09

Greg Hewgill

Related questions
                            
                                How to copy contents of the const char* type variable?
                            
                                What are the historical reasons C languages have pre-increments and post-increments?
                            
                                How can I implement a gpsd client (in C) to get Latitude, Longitude and Altitude? [closed]
                            
                                Why empty functions aren't removed as dead code in LLVM IR?
                            
                                How do I pass a std::function object to a function taking a function pointer?
                            
                                C compiling - 'undefined reference to function' when trying to link object files
                            
                                Printing a Unicode Symbol in C
                            
                                javac "no source files" when using -h option
                            
                                Compile Haskell programs to C
                            
                                How exactly does the ?: operator work in C?
                            
                                How to access a Python global variable from C?
                            
                                how to use libxml2 to modify an existing xml file?
                            
                                trying to copy struct members to byte array in c
                            
                                How do I compile DOS programs on Debian?
                            
                                What's the most efficient way to make bitwise operations in a C array
                            
                                I want to make my own Malloc
                            
                                Questions for compiling to LLVM
                            
                                compiling on windows and linux
                            
                                Best practices for local constants in objective-c
                            
                                This is a valid C code but not a valid C++ code?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Matrix Multiplication with Threads: Why is it not faster?

Tags:

c

multithreading

prelic

People also ask

1 Answers

Greg Hewgill

Recent Activity

Donate For Us