Parallel implementation for multiple SVDs using CUDA

Question

I'm new to parallel programming using GPU so I apologize if the question is broad or vague. I'm aware there is some parallel SVD function in the CULA library, but what should be the strategy if I have a large number of relatively small matrices to factorize? For example I have n matrices with dimension d, n is large and d is small. How to parallelize this process? Could anyone give me a hint?

Vitality · Accepted Answer

You can take a look at the Batched Operations post of the CULA blog for a discussion of your problem.

EDIT

From what I understand from your comment below, you would like each thread to calculate a separate SVD. So, basically each thread should execute a standard, sequential SVD scheme. For that some possibly useful references:

Numerical Recipes

Golub, Van Loan, Matrix Computations

If you use this approach, though, I'm afraid you will not be able anymore to use cuBLAS, as those are host functions not callable from the device (unless you do not have a compute capability >3.5, see the the simpleDevLibCUBLAS example.). But basically in this way I think you are somehow implementing the batch concept by yourself.

If you decide to go to a more standard parallel GPU implementation, the reference below could be of interest:

Singular Value Decomposition on GPU using CUDA

Parallel implementation for multiple SVDs using CUDA

Tags:

parallel-processing

cuda

gpu

svd

Logan Yang

1 Answers

Vitality

Recent Activity

Donate For Us

Parallel implementation for multiple SVDs using CUDA

Tags:

parallel-processing

cuda

gpu

svd

Logan Yang

1 Answers

Vitality

Related questions

Recent Activity

Donate For Us