Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matlab + CUDA slow in solving matrix-vector equation A*x=B

I am calculating an equation A*x=B, where A is a matrix and B is a vector, x is answer (unknown) vector.

Hardware specs: Intel i7 3630QM (4 cores), nVidia GeForce GT 640M (384 CUDA cores)

Here's an example:

>> A=rand(5000);

>> B=rand(5000,1);

>> Agpu=gpuArray(A);

>> Bgpu=gpuArray(B);

>> tic;A\B;toc;

Elapsed time is 1.382281 seconds.

>> tic;Agpu\Bgpu;toc;

Elapsed time is 4.775395 seconds.

Somehow GPU is much slower... Why? It is also slower in FFT, INV, LU calculations, which should be related with matrix division.

However, GPU is much faster in matrix multiplication (the same data):

>> tic;A*B;toc;

Elapsed time is 0.014700 seconds.

>> tic;Agpu*Bgpu;toc;

Elapsed time is 0.000505 seconds.

The main question is why GPU A\B (mldivide) is so slow comparing to CPU?

UPDATED

Here are some more results when A, B (on CPU), AA, BB (on GPU) are rand(5000):

>> tic;fft(A);toc;
Elapsed time is *0.117189 *seconds.
>> tic;fft(AA);toc;
Elapsed time is 1.062969 seconds.
>> tic;fft(AA);toc;
Elapsed time is 0.542242 seconds.
>> tic;fft(AA);toc;
Elapsed time is *0.229773* seconds.
>> tic;fft(AA);toc;

Bold times are stable times. However GPU is almost twice slower. By the way, why GPU is even more slower on first two attempts? Is it compiled twice firstly?

In addition:

>> tic;sin(A);toc;
Elapsed time is *0.121008* seconds.
>> tic;sin(AA);toc;
Elapsed time is 0.020448 seconds.
>> tic;sin(AA);toc;
Elapsed time is 0.157209 seconds.
>> tic;sin(AA);toc;
Elapsed time is *0.000419 *seconds

After two calculations GPU is incredibly faster in sin calculations.

So, still, why GPU is so slow in matrix division, fft and similar calculations, though it is so fast in matrix multiplication and trigonometry? The question actually should not be like that... GPU should be faster in all these calculations because Matlab has released overlapped functions (mldivide, fft) for GPU.

Could somebody help me solve these issues, please? :)

like image 806
Aurimas Šimkus Avatar asked Feb 16 '13 00:02

Aurimas Šimkus


2 Answers

Please read how Matlab calculates the solutions. It will help you understand why GPU is slower.

I'll try say it in few words.

A*x=b becomes L*(U*x=y)=b, L*U=A

  1. So Matlab makes A to L*U (This process cannot be done fully parallel as far as I know instead some steps can be done parallel, due to their nature)
  2. Then Matlab solves L*y=B and finds y. (This process cannot be done parallel as each step requires data from previous)
  3. Then Matlab solves U*x=y and finds x. (This process cannot be done parallel as each step requires data from previous)

So it GPU clock is slower than the CPU, and since processes cannot be done parallel, CPU is faster. And no, unless you come up with a better method (good luck!) then GPU will be always slower except in some very specific cases.

like image 181
ntarki Avatar answered Sep 27 '22 23:09

ntarki


Part 1 of the explanation is in the answer from user2230360, but your question is twofold, so I'll add a bit about the multiplication.

As noted already, the LU factorization is not very easily parallelized even if some steps can be. Matrix multiplication, however, is very much parallelizable. If you're working with these things you should be able to do matrix multiplication by hand, and then you will know that calculating the elements of the matrix C in A*B=C can be done in any order you want - hence the possibility for parallel computation. That is probably why you're seeing so lightning fast multiplication, but slow solving of linear systems. One can't be parallelized "as much as the other".

like image 23
pkofod Avatar answered Sep 27 '22 22:09

pkofod