I am totally new in cuda and I would like to write a cuda kernel that calculates a convolution given an input matrix, convolution (or filter) and an output matrix.
Note: I want each thread of the cuda kernel to calculate one value in the output matrix.
How can I do this?
I would like to write a cuda kernel that calculates a convolution given an input matrix, convolution (or filter) and an output matrix.
You might be interested in this treatment of the subject (although it's a little old). Or look at the CUDA convolution kernel sample programs: non-separable and separable
I want each thread of the cuda kernel to calculate one value in the output matrix.
If you follow the link, you'll realize you don't quite want that. In other words: Don't make rigid assumptions regarding how your kernel should divide work among the threads, you might change your mind later.
If the filters cover fill range of the matrix, then it can be directly converted to cublasSgemm.
For example, suppose the dimensions of the matrix is 5 * 4, and you need 130 filters, then the filters matrix to be trained is of dimensions 130 * 20, and the 5 * 4 matrix can be taken as 20 * 1.
In this way, the computation speed is optimal; it's converted to matrix multiplication between m1 (130, 20) and m2 (20, 1).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With