How to write the cuda kernel for convolutions?

Question

I am totally new in cuda and I would like to write a cuda kernel that calculates a convolution given an input matrix, convolution (or filter) and an output matrix.

Note: I want each thread of the cuda kernel to calculate one value in the output matrix.

How can I do this?

einpoklum · Accepted Answer

I would like to write a cuda kernel that calculates a convolution given an input matrix, convolution (or filter) and an output matrix.

You might be interested in this treatment of the subject (although it's a little old). Or look at the CUDA convolution kernel sample programs: non-separable and separable

I want each thread of the cuda kernel to calculate one value in the output matrix.

If you follow the link, you'll realize you don't quite want that. In other words: Don't make rigid assumptions regarding how your kernel should divide work among the threads, you might change your mind later.

Tom · Answer

If the filters cover fill range of the matrix, then it can be directly converted to cublasSgemm.

For example, suppose the dimensions of the matrix is 5 * 4, and you need 130 filters, then the filters matrix to be trained is of dimensions 130 * 20, and the 5 * 4 matrix can be taken as 20 * 1.

In this way, the computation speed is optimal; it's converted to matrix multiplication between m1 (130, 20) and m2 (20, 1).

How to write the cuda kernel for convolutions?

Tags:

cuda

gpgpu

nvidia

convolution

Bilgin

2 Answers

einpoklum

Tom

Recent Activity

Donate For Us

How to write the cuda kernel for convolutions?

Tags:

cuda

gpgpu

nvidia

convolution

Bilgin

2 Answers

einpoklum

Tom

Related questions

Recent Activity

Donate For Us