Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel implementation for multiple SVDs using CUDA

I'm new to parallel programming using GPU so I apologize if the question is broad or vague. I'm aware there is some parallel SVD function in the CULA library, but what should be the strategy if I have a large number of relatively small matrices to factorize? For example I have n matrices with dimension d, n is large and d is small. How to parallelize this process? Could anyone give me a hint?

like image 668
Logan Yang Avatar asked Jul 01 '13 10:07

Logan Yang


1 Answers

You can take a look at the Batched Operations post of the CULA blog for a discussion of your problem.

EDIT

From what I understand from your comment below, you would like each thread to calculate a separate SVD. So, basically each thread should execute a standard, sequential SVD scheme. For that some possibly useful references:

Numerical Recipes

Golub, Van Loan, Matrix Computations

If you use this approach, though, I'm afraid you will not be able anymore to use cuBLAS, as those are host functions not callable from the device (unless you do not have a compute capability >3.5, see the the simpleDevLibCUBLAS example.). But basically in this way I think you are somehow implementing the batch concept by yourself.

If you decide to go to a more standard parallel GPU implementation, the reference below could be of interest:

Singular Value Decomposition on GPU using CUDA

like image 74
Vitality Avatar answered Oct 16 '22 00:10

Vitality