Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matrix Multiply with Threads (each thread does single multiply)

I'm looking to do a matrix multiply using threads where each thread does a single multiplication and then the main thread will add up all of the results and place them in the appropriate spot in the final matrix (after the other threads have exited).

The way I am trying to do it is to create a single row array that holds the results of each thread. Then I would go through the array and add + place the results in the final matrix.

Ex: If you have the matrices:

A = [{1,4}, {2,5}, {3,6}] B = [{8,7,6}, {5,4,3}]

Then I want an array holding [8, 20, 7, 16, 6, 12, 16 etc] I would then loop through the array adding up every 2 numbers and placing them in my final array.

This is a HW assignment so I am not looking for exact code, but some logic on how to store the results in the array properly. I'm struggling with how to keep track of where I am in each matrix so that I don't miss any numbers.

Thanks.

EDIT2: Forgot to mention that there must be a single thread for every single multiplication to be done. Meaning for the example above, there will be 18 threads each doing its own calculation.

EDIT: I'm currently using this code as a base to work off of.

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

#define M 3
#define K 2
#define N 3
#define NUM_THREADS 10

int A [M][K] = { {1,4}, {2,5}, {3,6} };
int B [K][N] = { {8,7,6}, {5,4,3} };
int C [M][N];

struct v {
   int i; /* row */
   int j; /* column */
};

void *runner(void *param); /* the thread */

int main(int argc, char *argv[]) {

   int i,j, count = 0;
   for(i = 0; i < M; i++) {
      for(j = 0; j < N; j++) {
         //Assign a row and column for each thread
         struct v *data = (struct v *) malloc(sizeof(struct v));
         data->i = i;
         data->j = j;
         /* Now create the thread passing it data as a parameter */
         pthread_t tid;       //Thread ID
         pthread_attr_t attr; //Set of thread attributes
         //Get the default attributes
         pthread_attr_init(&attr);
         //Create the thread
         pthread_create(&tid,&attr,runner,data);
         //Make sure the parent waits for all thread to complete
         pthread_join(tid, NULL);
         count++;
      }
   }

   //Print out the resulting matrix
   for(i = 0; i < M; i++) {
      for(j = 0; j < N; j++) {
         printf("%d ", C[i][j]);
      }
      printf("\n");
   }
}

//The thread will begin control in this function
void *runner(void *param) {
   struct v *data = param; // the structure that holds our data
   int n, sum = 0; //the counter and sum

   //Row multiplied by column
   for(n = 0; n< K; n++){
      sum += A[data->i][n] * B[n][data->j];
   }
   //assign the sum to its coordinate
   C[data->i][data->j] = sum;

   //Exit the thread
   pthread_exit(0);
}

Source: http://macboypro.wordpress.com/2009/05/20/matrix-multiplication-in-c-using-pthreads-on-linux/

like image 207
Kinru Avatar asked Nov 12 '22 09:11

Kinru


1 Answers

You need to store M * K * N element-wise products. The idea is presumably that the threads will all run in parallel, or at least will be able to do, so each thread needs its own distinct storage location of appropriate type. A straightforward way to do that would be to create an array with that many elements ... but of what element type?

Each thread will need to know not only where to store its result, but also which multiplication to perform. All of that information needs to be conveyed via a single argument of type void *. One would typically, then, create a structure type suitable for holding all the data needed by one thread, create an instance of that structure type for each thread, and pass pointers to those structures. Sounds like you want an array of structures, then.

The details could be worked a variety of ways, but the one that seems most natural to me is to give the structure members for the two factors, and a member in which to store the product. I would then have the main thread declare a 3D array of such structures (if the needed total number is smallish) or else dynamically allocate one. For example,

struct multiplication {
    // written by the main thread; read by the compute thread:
    int factor1;
    int factor2;

    // written by the compute thread; read by the main thread:
    int product;
} partial_result[M][K][N];

How to write code around that is left as the exercise it is intended to be.

like image 103
John Bollinger Avatar answered Nov 15 '22 07:11

John Bollinger