I have to multiply a very small sized matrix ( size - 10x10 ) with a vector several times 50000 to 100000 times ( could even be more than that). This happens for 1000 different matrices (could be much more). Would there be any significant performance gain by doing this operation on CUDA.
Yes, it's an ideal task for the GPU.
If you want to multiply a single matrix with a vector 50K times and each multiplication is prerequisite to the previous then don't use CUDA. It's a serial problem, best suites for CPU. However if each multiplication is independent you can multiply them simultaneously on CUDA.
The only case where your program will give tremendous speedup is when each vector multiplication iteration is independent to the data of other iterations. This way you'll be able to launch 50K or more iterations simultaneously by launching equal number of threads.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With