Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Have you successfully used a GPGPU? [closed]

Tags:

cuda

gpgpu

hpc

I am interested to know whether anyone has written an application that takes advantage of a GPGPU by using, for example, nVidia CUDA. If so, what issues did you find and what performance gains did you achieve compared with a standard CPU?

like image 655
John Channing Avatar asked Sep 10 '08 21:09

John Channing


People also ask

Why don t we use GPU instead of CPU?

While GPUs can process data several orders of magnitude faster than a CPU due to massive parallelism, GPUs are not as versatile as CPUs. CPUs have large and broad instruction sets, managing every input and output of a computer, which a GPU cannot do.

Can CPU be faster than GPU?

Because of their serial processing capabilities, the CPU can multitask across multiple activities in your computer. Because of this, a strong CPU can provide more speed for typical computer use than a GPU. Contextual Power: In specific situations, the CPU will outperform the GPU.

How do I enable CUDA acceleration?

Enable CUDA optimization by going to the system menu, and select Edit > Preferences. Click on the Editing tab and then select the "Enable NVIDIA CUDA /ATI Stream technology to speed up video effect preview/render" check box within the GPU acceleration area. Click on the OK button to save your changes.

What is Cuda library?

NVIDIA® CUDA-X, built on top of NVIDIA CUDA®, is a collection of libraries, tools, and technologies that deliver dramatically higher performance—compared to CPU-only alternatives— across multiple application domains, from artificial intelligence (AI) to high performance computing (HPC).


2 Answers

I have been doing gpgpu development with ATI's stream SDK instead of Cuda. What kind of performance gain you will get depends on a lot of factors, but the most important is the numeric intensity. (That is, the ratio of compute operations to memory references.)

A BLAS level-1 or BLAS level-2 function like adding two vectors only does 1 math operation for each 3 memory references, so the NI is (1/3). This is always run slower with CAL or Cuda than just doing in on the cpu. The main reason is the time it takes to transfer the data from the cpu to the gpu and back.

For a function like FFT, there are O(N log N) computations and O(N) memory references, so the NI is O(log N). If N is very large, say 1,000,000 it will likely be faster to do it on the gpu; If N is small, say 1,000 it will almost certainly be slower.

For a BLAS level-3 or LAPACK function like LU decomposition of a matrix, or finding its eigenvalues, there are O( N^3) computations and O(N^2) memory references, so the NI is O(N). For very small arrays, say N is a few score, this will still be faster to do on the cpu, but as N increases, the algorithm very quickly goes from memory-bound to compute-bound and the performance increase on the gpu rises very quickly.

Anything involving complex arithemetic has more computations than scalar arithmetic, which usually doubles the NI and increases gpu performance.


(source: earthlink.net)

Here is the performance of CGEMM -- complex single precision matrix-matrix multiplication done on a Radeon 4870.

like image 157
Die in Sente Avatar answered Sep 21 '22 15:09

Die in Sente


I have written trivial applications, it really helps if you can parallize floating point calculations.

I found the following course cotaught by a University of Illinois Urbana Champaign professor and an NVIDIA engineer very useful when I was getting started: http://courses.ece.illinois.edu/ece498/al/Archive/Spring2007/Syllabus.html (includes recordings of all lectures).

like image 22
Tony BenBrahim Avatar answered Sep 22 '22 15:09

Tony BenBrahim