Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write the cuda kernel for convolutions?

I am totally new in cuda and I would like to write a cuda kernel that calculates a convolution given an input matrix, convolution (or filter) and an output matrix.

Note: I want each thread of the cuda kernel to calculate one value in the output matrix.

How can I do this?

like image 393
Bilgin Avatar asked Nov 15 '25 22:11

Bilgin


2 Answers

I would like to write a cuda kernel that calculates a convolution given an input matrix, convolution (or filter) and an output matrix.

You might be interested in this treatment of the subject (although it's a little old). Or look at the CUDA convolution kernel sample programs: non-separable and separable

I want each thread of the cuda kernel to calculate one value in the output matrix.

If you follow the link, you'll realize you don't quite want that. In other words: Don't make rigid assumptions regarding how your kernel should divide work among the threads, you might change your mind later.

like image 94
einpoklum Avatar answered Nov 17 '25 22:11

einpoklum


If the filters cover fill range of the matrix, then it can be directly converted to cublasSgemm.

For example, suppose the dimensions of the matrix is 5 * 4, and you need 130 filters, then the filters matrix to be trained is of dimensions 130 * 20, and the 5 * 4 matrix can be taken as 20 * 1.

In this way, the computation speed is optimal; it's converted to matrix multiplication between m1 (130, 20) and m2 (20, 1).

like image 34
Tom Avatar answered Nov 17 '25 22:11

Tom



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!