Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster Matrix Multiplication in C#

I have as small c# project that involves matrices. I am processing large amounts of data by splitting it into n-length chunks, treating the chucks as vectors, and multiplying by a Vandermonde** matrix. The problem is, depending on the conditions, the size of the chucks and corresponding Vandermonde** matrix can vary. I have a general solution which is easy to read, but way too slow:

    public byte[] addBlockRedundancy(byte[] data) {
        if (data.Length!=numGood) D.error("Expecting data to be just "+numGood+" bytes long");

        aMatrix d=aMatrix.newColumnMatrix(this.mod, data);
        var r=vandermonde.multiplyBy(d);
        return r.ToByteArray();
    }//method

This can process about 1/4 megabytes per second on my i5 U470 @ 1.33GHz. I can make this faster by manually inlining the matrix multiplication:

        int o=0;
        int d=0;
        for (d=0; d<data.Length-numGood; d+=numGood) {
            for (int r=0; r<numGood+numRedundant; r++) {
                Byte value=0;
                for (int c=0; c<numGood; c++) {
                    value=mod.Add(value, mod.Multiply(vandermonde.get(r, c), data[d+c]));
                }//for
                output[r][o]=value;
            }//for
            o++;
        }//for

This can process about 1 meg a second.

(Please note the "mod" is performing operations over GF(2^8) modulo my favorite irreducible polynomial.)

I know this can get a lot faster: After all, the Vandermonde** matrix is mostly zeros. I should be able to make a routine, or find a routine, that can take my matrix and return a optimized method which will effectively multiply vectors by the given matrix, but faster. Then, when I give this routine a 5x5 Vandermonde matrix (the identity matrix), there is simply no arithmetic to perform, and the original data is just copied.

** Please note: What I use the term "Vandermonde", I actually mean an Identity matrix with some number of rows from the Vandermonde matrix appended (see comments). This matrix is wonderful because of all the zeros, and because if you remove enough rows (of your choosing) to make it square, it is an invertible matrix. And, of course, I would like to use this same routine to convert any one of those inverted matrices into an optimized series of instructions.

How can I make this matrix multiplication faster?

Thanks!

(edited to correct my mistake with Vandermonde matrix)

like image 314
Kyle Lahnakoski Avatar asked Dec 29 '10 10:12

Kyle Lahnakoski


2 Answers

Maybe you can define a matrix interface and build implementations at runtime using Reflection.Emit.

IMatrix m = MatrixGenerator.CreateMatrix(data);

m.multiplyBy(...)

Here, MatrixGenerator.CreateMatrix will create a tailored IMatrix implementation, with full loop unrolling, and further code pruning (0 cell, identity, etc). MatrixGenerator.CreateMatrix may cache matrices to avoid recreating it later for the same set of data.

like image 142
Nicolas Repiquet Avatar answered Sep 26 '22 20:09

Nicolas Repiquet


I've seen solutions using Reflection.Emit, and I've seen solutions which involve TPL. The real answer here is, for most situations, that you want to use an existing unmanaged library such as Intel MKL via P/Invoke. Alternatively, if you are using the GPU, you can go with the GPGPU approach which would go a lot faster.

And yes, SSE together with multi-core processing is the fastest way to do it on a CPU. But I wouldn't recommend writing your own algorithm - instead, go look for something that's already out there. Most likely, it will end up being a C++ library, possibly with a C# wrapper.

like image 20
Dmitri Nesteruk Avatar answered Sep 24 '22 20:09

Dmitri Nesteruk