I'm looking for a very bare bones matrix multiplication example for CUBLAS that can multiply M times N and place the results in P for the following code, using high-performance GPU operations: <pre class="prettyprint"><code>float M[500][500], N[500][500], P[500][500]; for(int i = 0; i < Width; i++){ for(int j = 0; j < Width; j++) { M[i][j] = 500; N[i][j] = 500; P[i][j] = 0; } } </code></pre> So far, most code I'm finding to do any kind of matrix multiplication using CUBLAS is (seemingly?) overly complicated. I am attempting to design a basic lab where students can compare the performance of matrix multiplication on the GPU vs matrix multiplication on the CPU, presumably with increased performance on the GPU.

The SDK contains matrixMul which illustrates the use of CUBLAS. For a simpler example see the CUBLAS manual section 1.3. The matrixMul sample also shows a custom kernel, this won't perform as well as CUBLAS of course.

Simple CUBLAS Matrix Multiplication Example?

Tags:

cuda

gpu

matrix-multiplication

cublas

I'm looking for a very bare bones matrix multiplication example for CUBLAS that can multiply M times N and place the results in P for the following code, using high-performance GPU operations:

float M[500][500], N[500][500], P[500][500];
for(int i = 0; i < Width; i++){
    for(int j = 0; j < Width; j++)
    {
        M[i][j] = 500;
        N[i][j] = 500;
        P[i][j] = 0;
    }
}

So far, most code I'm finding to do any kind of matrix multiplication using CUBLAS is (seemingly?) overly complicated.

I am attempting to design a basic lab where students can compare the performance of matrix multiplication on the GPU vs matrix multiplication on the CPU, presumably with increased performance on the GPU.

239

asked Oct 03 '11 15:10

Chris Redford

2 Answers

The SDK contains matrixMul which illustrates the use of CUBLAS. For a simpler example see the CUBLAS manual section 1.3.

The matrixMul sample also shows a custom kernel, this won't perform as well as CUBLAS of course.

119

answered Oct 05 '22 05:10

Tom

CUBLAS is not necessary to show the GPU outperform the CPU, though CUBLAS would probably outperform it more. It appears that many straightforward CUDA implementations (including matrix multiplication) can outperform the CPU if given a large enough data set, as explained and demonstrated here:

Simplest Possible Example to Show GPU Outperform CPU Using CUDA

answered Oct 05 '22 05:10

Chris Redford

Related questions
                            
                                CUDA disable L1 cache only for one variable
                            
                                PTX - what is a CTA?
                            
                                Max number of threads which can be initiated in a single CUDA kernel
                            
                                cudaArray vs. device pointer
                            
                                Having Open MPI related issues while making CUDA 5.0 samples (Mac OS X ML)
                            
                                The different addressing modes of CUDA textures
                            
                                Using constants with CUDA
                            
                                Cannot launch Nvidia nsight
                            
                                Unresolved external symbols in beginners CUDA program
                            
                                Implementing a critical section in CUDA
                            
                                creating arrays in nvidia cuda kernel
                            
                                Feasibility of GPU as a CPU? [closed]
                            
                                CUDA: synchronizing threads
                            
                                How do I use atomicMax on floating-point values in CUDA?
                            
                                Why transposing a CUDA grid (but not its threadblocks) still slowdowns computation?
                            
                                Calculate eigenvalues/eigenvectors of hundreds of small matrices using CUDA
                            
                                How can I use 100% of VRAM on a secondary GPU from a single process on windows 10?
                            
                                What is the best algorithm for this array-comparison problem?
                            
                                __forceinline__ effect at CUDA C __device__ functions
                            
                                Compile cuda code for CPU

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Simple CUBLAS Matrix Multiplication Example?

Tags:

cuda

gpu

matrix-multiplication

cublas

Chris Redford

People also ask

2 Answers

Tom

Chris Redford

Recent Activity

Donate For Us