Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matrix Multiplication with Threads: Why is it not faster?

So I've been playing around with pthreads, specifically trying to calculate the product of two matrices. My code is extremely messy because it was just supposed to be a quick little fun project for myself, but the thread theory I used was very similar to:

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

#define M 3
#define K 2
#define N 3
#define NUM_THREADS 10

int A [M][K] = { {1,4}, {2,5}, {3,6} };
int B [K][N] = { {8,7,6}, {5,4,3} };
int C [M][N];

struct v {
   int i; /* row */
   int j; /* column */
};

void *runner(void *param); /* the thread */

int main(int argc, char *argv[]) {

   int i,j, count = 0;
   for(i = 0; i < M; i++) {
      for(j = 0; j < N; j++) {
         //Assign a row and column for each thread
         struct v *data = (struct v *) malloc(sizeof(struct v));
         data->i = i;
         data->j = j;
         /* Now create the thread passing it data as a parameter */
         pthread_t tid;       //Thread ID
         pthread_attr_t attr; //Set of thread attributes
         //Get the default attributes
         pthread_attr_init(&attr);
         //Create the thread
         pthread_create(&tid,&attr,runner,data);
         //Make sure the parent waits for all thread to complete
         pthread_join(tid, NULL);
         count++;
      }
   }

   //Print out the resulting matrix
   for(i = 0; i < M; i++) {
      for(j = 0; j < N; j++) {
         printf("%d ", C[i][j]);
     }
      printf("\n");
   }
}

//The thread will begin control in this function
void *runner(void *param) {
   struct v *data = param; // the structure that holds our data
   int n, sum = 0; //the counter and sum

   //Row multiplied by column
   for(n = 0; n< K; n++){
      sum += A[data->i][n] * B[n][data->j];
   }
   //assign the sum to its coordinate
   C[data->i][data->j] = sum;

   //Exit the thread
   pthread_exit(0);
}

source: http://macboypro.com/blog/2009/06/29/matrix-multiplication-in-c-using-pthreads-on-linux/

For the non-threaded version, I used the same setup (3 2-d matrices, dynamically allocated structs to hold r/c), and added a timer. First trials indicated that the non-threaded version was faster. My first thought was that the dimensions were too small to notice a difference, and it was taking longer to create the threads. So I upped the dimensions to about 50x50, randomly filled, and ran it, and I'm still not seeing any performance upgrade with the threaded version.

What am I missing here?

like image 843
prelic Avatar asked Jun 06 '10 22:06

prelic


People also ask

Which matrix multiplication is faster?

In linear algebra, the Strassen algorithm, named after Volker Strassen, is an algorithm for matrix multiplication. It is faster than the standard matrix multiplication algorithm for large matrices, with a better asymptotic complexity, although the naive algorithm is often better for smaller matrices.

How does multithreading help matrix multiplication?

In multi-threading, instead of utilizing a single core of your processor, we utilizes all or more core to solve the problem. We create different threads, each thread evaluating some part of matrix multiplication. Depending upon the number of cores your processor has, you can create the number of threads required.

What is the time complexity of matrix multiplication problem?

This is a major open question in theoretical computer science. As of December 2020, the matrix multiplication algorithm with best asymptotic complexity runs in O(n2.3728596) time, given by Josh Alman and Virginia Vassilevska Williams.

Why is Matlab so fast in matrix multiplication?

Because MATLAB is a programming language at first developed for numerical linear algebra (matrix manipulations), which has libraries especially developed for matrix multiplications. And now MATLAB can also use the GPUs (Graphics processing unit) for this additionally.


1 Answers

Unless you're working with very large matrices (many thousands of rows/columns), then you are unlikely to see much improvement from this approach. Setting up a thread on a modern CPU/OS is actually pretty expensive in relative terms of CPU time, much more time than a few multiply operations.

Also, it's usually not worthwhile to set up more than one thread per CPU core that you have available. If you have, say, only two cores and you set up 2500 threads (for 50x50 matrices), then the OS is going to spend all its time managing and switching between those 2500 threads rather than doing your calculations.

If you were to set up two threads beforehand (still assuming a two-core CPU), keep those threads available all the time waiting for work to do, and supply them with the 2500 dot products you need to calculate in some kind of synchronised work queue, then you might start to see an improvement. However, it still won't ever be more than 50% better than using only one core.

like image 118
Greg Hewgill Avatar answered Sep 28 '22 20:09

Greg Hewgill