Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use SIMD if we have GPGPU? [closed]

Now that we have GPGPUs with languages like CUDA and OpenCL, do the multimedia SIMD extensions (SSE/AVX/NEON) still serve a purpose?

I read an article recently about how SSE instructions could be used to accelerate sorting networks. I thought this was pretty neat but when I told my comp arch professor he laughed and said that running similar code on a GPU would destroy the SIMD version. I don't doubt this because SSE is very simple and GPUs are large highly-complex accelerators with a lot more parallelism, but it got me thinking, are there many scenarios where the multimedia SIMD extensions are more useful than using a GPU?

If GPGPUs make SIMD redundant, why would Intel be increasing their SIMD support? SSE was 128 bits, now it's 256 bits with AVX and next year it will be 512 bits. If GPGPUs are better processing code with data parallelism why is Intel pushing these SIMD extensions? They might be able to put the equivalent resources (research and area) into a larger cache and branch predictor thus improving serial performance.

Why use SIMD instead of GPGPUs?

like image 309
jonfrazen1 Avatar asked Sep 02 '14 18:09

jonfrazen1


People also ask

Why is GPU SIMD?

Indeed, GPUs are said to be stream processors or S.I.M.D. ( Single Instruction, Multiple Data ) processors because of their ability to compute a whole set of data (like a picture) with a single instruction. Contrary to CPUs, who, in their simplest form compute one single data by instruction.

Is GPU an SIMD processor?

Modern graphics processing units (GPUs) are often wide SIMD implementations, capable of branches, loads, and stores on 128 or 256 bits at a time.

What is SIMD stackoverflow?

Single instruction, multiple data (SIMD) is the concept of having each instruction operate on a small chunk or vector of data elements.

Is AVX faster?

With Advanced Vector Extensions (AVX), a specialized parallelized instruction set in the x86 architecture, copying gets a lot faster. I mean a LOT faster. 2000% faster.


2 Answers

Absolutely SIMD is still relevant.

First, SIMD can more easily interoperate with scalar code, because it can read and write the same memory directly, while GPUs require the data to be uploaded to GPU memory before it can be accessed. For example, it's straightforward to vectorize a function like memcmp() via SIMD, but it would be absurd to implement memcmp() by uploading the data to the GPU and running it there. The latency would be crushing.

Second, both SIMD and GPUs are bad at highly branchy code, but SIMD is somewhat less worse. This is due to the fact that GPUs group multiple threads (a "warp") under a single instruction dispatcher. So what happens when threads need to take different paths: an if branch is taken in one thread, and the else branch is taken in another? This is called a "branch divergence" and it is slow: all the "if" threads execute while the "else" threads wait, and then the "else" threads execute while the "if" threads wait. CPU cores, of course, do not have this limitation.

The upshot is that SIMD is better for what might be called "intermediate workloads:" workloads up to intermediate size, with some data-parallelism, some unpredictability in access patterns, some branchiness. GPUs are better for very large workloads that have predictable execution flow and access patterns.

(There's also some peripheral reasons, such as better support for double precision floating point in CPUs.)

like image 61
ridiculous_fish Avatar answered Sep 27 '22 23:09

ridiculous_fish


GPU has controllable dedicated caches, CPU has better branching. Other than that, compute performance relies on SIMD width, integer core density, and instruction level parallelism.

Also another important parameter is that how far the data is to a CPU or GPU. (Your data could be an opengl buffer in a discrete GPU and you may need to download it to RAM before computing with CPU, same effect can be seen when a host buffer is in RAM and needs to be computed on discrete GPU )

like image 37
huseyin tugrul buyukisik Avatar answered Sep 28 '22 00:09

huseyin tugrul buyukisik