I am beginner in parallel programming. I have a query which might be seem to be silly but I didn't get a definitive answer when I googled it out.
In GPU computing there is a device i.e. the GPU and the host i.e. the CPU. I wrote a simple hello world program which will allocate some memory on the gpu, pass two parameters (say src[] and dest[]) to the kernel, copy src string i.e. Hello world to dest string and get the dest string from gpu to the host.
Is the string "src" read by the GPU or the CPU writes to the GPU ? Also when we get back the string from GPU, is the GPU writing to the CPU or the CPU reading from the GPU?
In transferring the data back and forth there can be four possibilities 1. CPU to GPU - CPU writes to GPU - GPU reads form CPU 2. GPU to CPU - GPU writes to the CPU - CPU reads from GPU
Can someone please explain which of these are possible and which are not?
How CPU and GPU Work Together. A CPU (central processing unit) works together with a GPU (graphics processing unit) to increase the throughput of data and the number of concurrent calculations within an application.
Designed specifically for the needs of GPU acceleration, GPUDirect RDMA provides direct communication between NVIDIA GPUs in remote systems. This eliminates the system CPUs and the required buffer copies of data via the system memory, resulting in 10X better performance.
There are DMA engines in GPUs and storage-related devices like NVMe drivers and storage controllers but generally not in CPUs.
Further, GPUs provide superior processing power and memory bandwidth. They are up to 100 times faster than CPUs with non-optimized software without AVX2 instructions while performing tasks requiring large caches of data and multiple parallel computations.
In earlier versions of CUDA and corresponding hardware models, the GPU was more strictly a coprocessor owned by the CPU; the CPU wrote information to the GPU, and read the information back when the GPU was ready. At the lower level, this meant that really all four things were happening: the CPU wrote data to PCIe, the GPU read data from PCIe, the GPU then wrote data to PCIe, and the CPU read back the result. But transactions were initiated by the CPU.
More recently (CUDA 3? 4? maybe even beginning in 2?), some of these details are hidden from the application level, so that, effectively, GPU code can cause transfers to be initiated in much the same way as the CPU can. Consider unified virtual addressing, whereby programmers can access a unified virtual address space for CPU and GPU memory. When the GPU requests memory in the CPU space, this must initiate a transfer from the CPU, essentially reading from the CPU. The ability to put data onto the GPU from the CPU side is also retained. Basically, all ways are possible now, at the top level (at low levels, it's largely the same sort of protocol as always: both read from and write to the PCIe bus, but now, GPUs can initiate transactions as well).
Actually none of these. Your CPU code initiates the copy of data, but while the data is transferred by the memory controller to the memory of the GPU through whatever bus you have on your system. Meanwhile, the CPU can process other data. Similarly, when the GPU has finished running the kernels you launched, your CPU code initiates the copy of data, but meanwhile both GPU and CPU can handle other data or run other code.
The copies are called asynchronous or non-blocking. You can optionally do blocking copies, in which the CPU waits for the copy to be completed.
When launching asynchronous tasks, you usually register an "event", which is some kind of flag that you can check later on, to see if the task is finished or not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With