GPU uses the SIMD paradigm, that is, the same portion of code will be executed in parallel, and applied to various elements of a data set. However, CPU also uses SIMD, and provide instruction-level parallelism. For example, as far as I know, SSE-like instructions will process data elements with parallelism. While the SIMD paradigm seems to be used differently in GPU and CPU, does GPUs have more SIMD power than CPUs? In which way the parallel computational capabilities in a CPU are 'weaker' than the ones in a GPU?

Both CPUs & GPUs provide SIMD with the most standard conceptual unit being 16 bytes/128 bits; for example a Vector of 4 floats (x,y,z,w). Simplifying: CPUs then parallelize more through pipelining future instructions so they proceed faster through a program. Then next step is multiple cores which run independent programs. GPUs on the other hand parallelize by continuing the SIMD approach and executing the same program multiple times; both by pure SIMD where a set of programs execute in lock step (which is why branching is bad on a GPU, as both sides of an if statement must execute; and one result be thrown away so that the lock step programs proceed at the same rate); and also by single program, multiple data (SPMD) where groups of the sets of identical programs proceed in parallel but not necessarily in lock step. The GPU approach is great where the exact same processing needs be applied to large volumes of data; for example a million vertices than need to be transformed in the same way, or many million pixels that need the processing to produce their colour. Assuming they don't become data block/pipeline stalled, GPUs programs general offer more predictable time bound execution due to its restrictions; which again is good for temporal parallelism e.g. the programs need to repeat their cycle at a certain rate for example 60 times a second (16ms) for 60 fps. The CPU approach however is better for decisioning and performing multiple different tasks at the same time and dealing with changing inputs and requests. Apart from its many other uses and purposes, the CPU is used to orchestrate work for the GPU to perform.

It's a similar idea, it goes kind of like this (very informally speaking): <ul> <li>The CPU has a set amount of functions that can run on packed values. Depending on your brand and version of your CPU, you might have access to SSE2, 3, 4, 3dnow, etc, and each of them gives you access to more and more functions. You're limited by the register size and the larger data types you work with the less values you can use in parallel. You can freely mix and match SIMD instructions with traditional x86/x64 instructions.</li> <li>The GPU lets you write your entire pipeline for each pixel of a texture. The texture size doesn't depend on your pipeline length, ie the number of values you can affect in one cycle isn't dependant on anything but your GPU, and the functions you can chain (your pixel shader) can be pretty much anything. It's somewhat more rigid though in that the setup and readback of your values is somewhat slower, and it's a one shot process (load values, run shader, read values), you can't massage them at all besides that, so you actually need to use a lot of values for it to be worth it.</li> </ul>

CPU SIMD vs GPU SIMD?

2 Answers

Both CPUs & GPUs provide SIMD with the most standard conceptual unit being 16 bytes/128 bits; for example a Vector of 4 floats (x,y,z,w).

Simplifying:

CPUs then parallelize more through pipelining future instructions so they proceed faster through a program. Then next step is multiple cores which run independent programs.

GPUs on the other hand parallelize by continuing the SIMD approach and executing the same program multiple times; both by pure SIMD where a set of programs execute in lock step (which is why branching is bad on a GPU, as both sides of an if statement must execute; and one result be thrown away so that the lock step programs proceed at the same rate); and also by single program, multiple data (SPMD) where groups of the sets of identical programs proceed in parallel but not necessarily in lock step.

The GPU approach is great where the exact same processing needs be applied to large volumes of data; for example a million vertices than need to be transformed in the same way, or many million pixels that need the processing to produce their colour. Assuming they don't become data block/pipeline stalled, GPUs programs general offer more predictable time bound execution due to its restrictions; which again is good for temporal parallelism e.g. the programs need to repeat their cycle at a certain rate for example 60 times a second (16ms) for 60 fps.

The CPU approach however is better for decisioning and performing multiple different tasks at the same time and dealing with changing inputs and requests.

Apart from its many other uses and purposes, the CPU is used to orchestrate work for the GPU to perform.

answered Oct 08 '22 11:10

Ben Adams

It's a similar idea, it goes kind of like this (very informally speaking):

The CPU has a set amount of functions that can run on packed values. Depending on your brand and version of your CPU, you might have access to SSE2, 3, 4, 3dnow, etc, and each of them gives you access to more and more functions. You're limited by the register size and the larger data types you work with the less values you can use in parallel. You can freely mix and match SIMD instructions with traditional x86/x64 instructions.
The GPU lets you write your entire pipeline for each pixel of a texture. The texture size doesn't depend on your pipeline length, ie the number of values you can affect in one cycle isn't dependant on anything but your GPU, and the functions you can chain (your pixel shader) can be pretty much anything. It's somewhat more rigid though in that the setup and readback of your values is somewhat slower, and it's a one shot process (load values, run shader, read values), you can't massage them at all besides that, so you actually need to use a lot of values for it to be worth it.

answered Oct 08 '22 11:10

Blindy

Related questions
                            
                                Why do we need 'seq' or 'pseq' with 'par' in Haskell?
                            
                                parallel execution of random forest in R
                            
                                Executing Multiple AsyncTask's Parallely
                            
                                Parallel for loop in openmp
                            
                                How to make R use all processors?
                            
                                Task.WaitAll method vs Parallel.Invoke method
                            
                                What type of problems can mapreduce solve?
                            
                                doParallel error in R: Error in serialize(data, node$con) : error writing to connection
                            
                                How do laziness and parallelism coexist in Haskell?
                            
                                What's the best way to update an ObservableCollection from another thread?
                            
                                Java 8 parallel sorting vs Scala parallel sorting
                            
                                Use cases for ithreads (interpreter threads) in Perl and rationale for using or not using them?
                            
                                concurrent.futures.ProcessPoolExecutor vs multiprocessing.pool.Pool [duplicate]
                            
                                Why is Parallel.ForEach much faster then AsParallel().ForAll() even though MSDN suggests otherwise?
                            
                                Why is MPI considered harder than shared memory and Erlang considered easier, when they are both message-passing?
                            
                                Check if task is already running before starting new
                            
                                How to run given function in Bash in parallel?
                            
                                What multithreading package for Lua "just works" as shipped?
                            
                                C# Asynchronous Options for Processing a List
                            
                                What's the advantage of a Java-5 ThreadPoolExecutor over a Java-7 ForkJoinPool?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CPU SIMD vs GPU SIMD?

Tags:

parallel-processing

cpu

gpu

simd

Carmellose

People also ask

2 Answers

Ben Adams

Blindy

Recent Activity

Donate For Us