This is the code I wrote: <pre class="prettyprint"><code>#include <omp.h> void matrix_multi(int c[][TSIZE], int a[][TSIZE], int b[][TSIZE]) { int B=8; int i, j, k,i1,j1,k1; #pragma omp parallel for private(i,j,k,i1,j1,k1) schedule(auto) collapse(3) for (i=0; i<TSIZE; i+=B) for (j=0; j<TSIZE; j+=B) for (k=0; k<TSIZE; k+=B) for (i1=i;i1<i+B;i1++) for (j1=j;j1<j+B;j1++) { int sum=0; for (k1=k;k1<k+B;k1++) { sum+=a[i1][k1]*b[k1][j1]; } c[i1][j1]+=sum; } } </code></pre> My question is: Can I get a better performance with some further manipulation on three inner loops?

Linear algebra is one of the most common operations computers perform. In games and graphics libraries it is THE most common operation. It has been studied and optimized heavily, with entire research groups dedicated to it. If you care about speed, you should be performing matrix multiplication with a BLAS library. Some of the things that a BLAS library will optimize for: <ul> <li>minimize cache-misses by performing the matrix multiplication in blocks rather than looping over the entire matrix</li> <li>optimize the block size for the cache-size of the computer</li> <li>if the computer/CPU has multiple cache levels, use multiple optimized block size levels</li> <li>use SIMD instructions if available on the CPU</li> </ul> Note that parallelization is not on the list. This is because in today's computers memory access is slower than the CPU. You will see worse performance with openmp due to the overhead of context switching.

It seems that you are far away from fully optimized. Have you tried loop unroll, loop inversion, etc.? You could refer to the following link for a step by step optimization on matrix multiplication. http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm/

Speed Up Matrix Multiplication with OpenMP and Block Method: Can I Do Better?

Tags:

c

parallel-processing

matrix-multiplication

openmp

This is the code I wrote:

#include <omp.h>
void matrix_multi(int c[][TSIZE], int a[][TSIZE], int b[][TSIZE])
{
   int B=8;

  int i, j, k,i1,j1,k1;
#pragma omp parallel for private(i,j,k,i1,j1,k1) schedule(auto) collapse(3)
  for (i=0; i<TSIZE; i+=B)
    for (j=0; j<TSIZE; j+=B)
      for (k=0; k<TSIZE; k+=B)
        for (i1=i;i1<i+B;i1++)
          for (j1=j;j1<j+B;j1++)
            {
              int sum=0;
              for (k1=k;k1<k+B;k1++)
                {
                  sum+=a[i1][k1]*b[k1][j1];
                }
              c[i1][j1]+=sum;
            }

}

My question is: Can I get a better performance with some further manipulation on three inner loops?

757

asked May 18 '16 05:05

Huanming Song

2 Answers

Linear algebra is one of the most common operations computers perform. In games and graphics libraries it is THE most common operation. It has been studied and optimized heavily, with entire research groups dedicated to it.

If you care about speed, you should be performing matrix multiplication with a BLAS library. Some of the things that a BLAS library will optimize for:

minimize cache-misses by performing the matrix multiplication in blocks rather than looping over the entire matrix
optimize the block size for the cache-size of the computer
if the computer/CPU has multiple cache levels, use multiple optimized block size levels
use SIMD instructions if available on the CPU

Note that parallelization is not on the list. This is because in today's computers memory access is slower than the CPU. You will see worse performance with openmp due to the overhead of context switching.

112

answered Oct 06 '22 00:10

zyamys

It seems that you are far away from fully optimized. Have you tried loop unroll, loop inversion, etc.?

You could refer to the following link for a step by step optimization on matrix multiplication.

http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm/

answered Oct 05 '22 22:10

kangshiyin

Related questions
                            
                                How to initialize variable with cleanup attribute?
                            
                                OpenGL ES 2.0 + Cairo: HUD
                            
                                File Descriptors and File Handles (and C)
                            
                                Is it feasible to unit test kernel module code ?
                            
                                How Linux decides what `malloc` to use?
                            
                                How many maximum different CPU-Cores can be used to processing of one IP-packet?
                            
                                How to convert C++/CLI string to const char*
                            
                                What does this pointer to pointer in a struct mean?
                            
                                Calculate reachability to a function using frama-c's value analysis
                            
                                Locate __proc_info symbol in XNU project
                            
                                How can I set a compiler warning (GNU GCC) when overwriting a weak function
                            
                                c code: how to detect duplicate function declarations
                            
                                Is it legal to access a field of a returned union without a variable? [duplicate]
                            
                                Cast a uint32 variable to a bit field - undefined behavior?
                            
                                Can I select() on a /dev/spidev file descriptor?
                            
                                Measuring size of a function generated with Clang/LLVM?
                            
                                Printf() - printed characters limit
                            
                                C Mandelbrot Set Coloring
                            
                                Macro within macro in C
                            
                                Call function only knowing parameter types at runtime in C?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With