Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between CUDA core and CPU core?

Tags:

cpu

cuda

I worked a bit with CUDA, and a lot with the CPU, and i'm trying to understand what is the difference between the two. My I5 processor has 4 cores and cost $200 and my NVidia 660 has 960 cores and cost about the same.

I would be really happy if someone could explain what are the key differences between the two processing units architecture in terms of abilities pros and cons. For example, does a CUDA core have branch prediction?

like image 881
OopsUser Avatar asked Jan 07 '14 16:01

OopsUser


People also ask

What are CUDA cores in CPU?

CUDA, which stands for Compute Unified Device Architecture, Cores are the Nvidia GPU equivalent of CPU cores that have been designed to take on multiple calculations at the same time, which is significant when you're playing a graphically demanding game. One CUDA Core is very similar to a CPU Core.

Does more CUDA cores mean better?

More number of CUDA cores means more data can be processed parallelly. More clock speed means that a single core can perform much faster. The GPUs get better with new generations and architectures, so a graphic card with more number of CUDA cores is not necessarily more powerful than the one with lesser CUDA cores.

Is CUDA CPU or GPU?

CUDA (Compute Unified Device Architecture) is a new hardware and software architecture for issuing and managing data-parallel computations on the GPU without the need of mapping them to a graphics API [1].

What is the difference between CUDA cores and tensor cores?

CUDA cores perform one operation per clock cycle, whereas tensor cores can perform multiple operations per clock cycle. Everything comes with a cost, and here, the cost is accuracy. Accuracy takes a hit to boost the computation speed.


4 Answers

It is a computer Architecture question which entails a long answer. I will try to keep it very simple on the risk of being inaccurate. You basically self-answered your question by asking do CUDA core handle branch prediction, the answer is NO. A CPU core has to handle each single operation a computer does, calculation, memory fetching, IO, interrupts, therefore it has a huge complex instruction set, and to optimize the speed of fetching instruction branch prediction is used.
Also it has a big cache and fast clock rate. To implement the instruction set you need more logic thus more transistors more cost per core compared to the GPU.

The GPU cores have less cache memory, simpler instruction and less clock rate per clock, however they are optimized to do more calculation as a group. The simple instructions set, the less cache memory makes them less expensive per core.

like image 130
Nadim Farhat Avatar answered Oct 17 '22 03:10

Nadim Farhat


Cuda cores are more lanes of a vector unit, gathered into warps. In essence cuda cores are entries in a wider AVX or VSX or NEON vector.

The closest to a CPU core is an SMX. It can handle multiple contexts (warps, hyper threading, SMT), and has several parallel execution pipelines (6 FP32 for Kepler, 2 on Haswell, 2 on Power 8). And each SMX is independent, just as any core or a general purpose CPU.

This analogy is detailed further here: https://stackoverflow.com/a/36812922/6218300.

like image 29
Florent DUGUET Avatar answered Oct 17 '22 02:10

Florent DUGUET


They are now in principle the same as CPU cores. It isn't long ago that this wasn't true for example they have been unable to process integer operands in 2005.

When comparing the CPU Core complexity of your 2-core i5 keep in mind that the original 80386 CPU had just about 275K transistors while a Core2Duo has about 230 Million. 1000 times more, so the numbers fit well.

The biggest difference is the memory handling which becomes even more complicated then the good old days when we need segmentation registers. There is no virtual memory etc. and it is the very thin bottleneck when you try to port your normal CPU programs but the real problem is that non local memory access is very expensive 400-800 cycles. They are using a technique that outside the GPU world only the SUN Niagara T1/T2 general purpose CPU had. While waiting for a memory access they schedule different set of threads with other instructions that are ready (called wraps). But if all the threads do is non-linear jumping around your data your performance just fails.

like image 38
Lothar Avatar answered Oct 17 '22 02:10

Lothar


You need to understand the fundamental difference between CPU Vs GPU and the need for rise of GPGPU in recent tims. One of the informative course on this is available in Udacity

Also, this book might be helpful for beginner level programs.

Though this is not a programming question. Hope it might help someone.

like image 25
Itachi Avatar answered Oct 17 '22 01:10

Itachi