I need frequent usage of <code>matrix_vector_mult()</code> which multiplies matrix with vector, and below is its implementation. Question: Is there a simple way to make it significantly, at least twice, faster? Remarks: 1) The size of the matrix is about 300x50. It doesn't change during the run. 2) It must work on both Windows and Linux. <pre class="prettyprint"><code>double vectors_dot_prod(const double *x, const double *y, int n) { double res = 0.0; int i; for (i = 0; i < n; i++) { res += x[i] * y[i]; } return res; } void matrix_vector_mult(const double **mat, const double *vec, double *result, int rows, int cols) { // in matrix form: result = mat * vec; int i; for (i = 0; i < rows; i++) { result[i] = vectors_dot_prod(mat[i], vec, cols); } } </code></pre>

This is something that in theory a good compiler should do by itself, however I made a try with my system (g++ 4.6.3) and got about twice the speed on a 300x50 matrix by hand unrolling 4 multiplications (about 18us per matrix instead of 34us per matrix): <pre class="prettyprint"><code>double vectors_dot_prod2(const double *x, const double *y, int n) { double res = 0.0; int i = 0; for (; i <= n-4; i+=4) { res += (x[i] * y[i] + x[i+1] * y[i+1] + x[i+2] * y[i+2] + x[i+3] * y[i+3]); } for (; i < n; i++) { res += x[i] * y[i]; } return res; } </code></pre> I expect however the results of this level of micro-optimization to vary wildly between systems.

Simple and fast matrix-vector multiplication in C / C++

Tags:

I need frequent usage of matrix_vector_mult() which multiplies matrix with vector, and below is its implementation.

Question: Is there a simple way to make it significantly, at least twice, faster?

Remarks: 1) The size of the matrix is about 300x50. It doesn't change during the run. 2) It must work on both Windows and Linux.

double vectors_dot_prod(const double *x, const double *y, int n)
{
    double res = 0.0;
    int i;
    for (i = 0; i < n; i++)
    {
        res += x[i] * y[i];
    }
    return res;
}

void matrix_vector_mult(const double **mat, const double *vec, double *result, int rows, int cols)
{ // in matrix form: result = mat * vec;
    int i;
    for (i = 0; i < rows; i++)
    {
        result[i] = vectors_dot_prod(mat[i], vec, cols);
    }
}

689

asked Sep 05 '12 20:09

Serg

2 Answers

This is something that in theory a good compiler should do by itself, however I made a try with my system (g++ 4.6.3) and got about twice the speed on a 300x50 matrix by hand unrolling 4 multiplications (about 18us per matrix instead of 34us per matrix):

double vectors_dot_prod2(const double *x, const double *y, int n)
{
    double res = 0.0;
    int i = 0;
    for (; i <= n-4; i+=4)
    {
        res += (x[i] * y[i] +
                x[i+1] * y[i+1] +
                x[i+2] * y[i+2] +
                x[i+3] * y[i+3]);
    }
    for (; i < n; i++)
    {
        res += x[i] * y[i];
    }
    return res;
}

I expect however the results of this level of micro-optimization to vary wildly between systems.

120

answered Sep 28 '22 10:09

6502

As Zhenya says, just use a good BLAS or matrix math library.

If for some reason you can't do that, see if your compiler can unroll and/or vectorize your loops; making sure rows and cols are both constants at the call site may help, assuming the functions you posted are available for inlining

If you still can't get the speedup you need, you're looking at manual unrolling, and vectorizing using extensions or inline assembler.

answered Sep 28 '22 11:09

Useless

Related questions
                            
                                iOS6 - Social Framework - how does SLComposeViewController fallback to TWTweetComposeViewController for iOS5?
                            
                                Convert .psd and .ai to PNG/JPG with imagick
                            
                                JavaFX app in System Tray
                            
                                How to make UIImageView automatically resize to the size of the image loaded
                            
                                Access JavaScript Object Literal value in same object [duplicate]
                            
                                Is asynchronous in C# the same implementation as in F#?
                            
                                Layer Ordering in leaflet.js
                            
                                How can I reconnect a git repository with a svn repository?
                            
                                Position:absolute element being hidden behind later elements
                            
                                Sort a list based on dictionary values in python?
                            
                                Downside of "display: block" for images?
                            
                                Polygons nicely cropping ggplot2/ggmap at different zoom levels

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With