Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple and fast matrix-vector multiplication in C / C++

Tags:

I need frequent usage of matrix_vector_mult() which multiplies matrix with vector, and below is its implementation.

Question: Is there a simple way to make it significantly, at least twice, faster?

Remarks: 1) The size of the matrix is about 300x50. It doesn't change during the run. 2) It must work on both Windows and Linux.

double vectors_dot_prod(const double *x, const double *y, int n)
{
    double res = 0.0;
    int i;
    for (i = 0; i < n; i++)
    {
        res += x[i] * y[i];
    }
    return res;
}

void matrix_vector_mult(const double **mat, const double *vec, double *result, int rows, int cols)
{ // in matrix form: result = mat * vec;
    int i;
    for (i = 0; i < rows; i++)
    {
        result[i] = vectors_dot_prod(mat[i], vec, cols);
    }
}
like image 689
Serg Avatar asked Sep 05 '12 20:09

Serg


People also ask

How do you multiply a matrix by a vector?

By the definition, number of columns in A equals the number of rows in y . First, multiply Row 1 of the matrix by Column 1 of the vector. Next, multiply Row 2 of the matrix by Column 1 of the vector. Finally multiply Row 3 of the matrix by Column 1 of the vector.

What is the complexity of matrix vector multiplication?

Assuming that each operation for an field F can be done in O(1) time, this implies that the worst-case complexity of matrix-vector multiplication is Θ(mn).

How do you multiply vectors by vectors?

Alternatively, it is defined as the product of the projection of the first vector onto the second vector and the magnitude of the second vector. Thus, A ⋅ B = |A| |B| cos θ


2 Answers

This is something that in theory a good compiler should do by itself, however I made a try with my system (g++ 4.6.3) and got about twice the speed on a 300x50 matrix by hand unrolling 4 multiplications (about 18us per matrix instead of 34us per matrix):

double vectors_dot_prod2(const double *x, const double *y, int n)
{
    double res = 0.0;
    int i = 0;
    for (; i <= n-4; i+=4)
    {
        res += (x[i] * y[i] +
                x[i+1] * y[i+1] +
                x[i+2] * y[i+2] +
                x[i+3] * y[i+3]);
    }
    for (; i < n; i++)
    {
        res += x[i] * y[i];
    }
    return res;
}

I expect however the results of this level of micro-optimization to vary wildly between systems.

like image 120
6502 Avatar answered Sep 28 '22 10:09

6502


As Zhenya says, just use a good BLAS or matrix math library.

If for some reason you can't do that, see if your compiler can unroll and/or vectorize your loops; making sure rows and cols are both constants at the call site may help, assuming the functions you posted are available for inlining

If you still can't get the speedup you need, you're looking at manual unrolling, and vectorizing using extensions or inline assembler.

like image 43
Useless Avatar answered Sep 28 '22 11:09

Useless