I am creating something similar to CUDA but I saw that copy memory from RAM to VRAM is very fast like copying from RAM to itself. But copy from VRAM to RAM is a way slower than RAM to VRAM.
By the way I am using glTexSubImage2D
to copy from RAM to VRAM and glGetTexImage
to copy from VRAM to RAM.
Why? Is there a way to improve it's performance like copying RAM to VRAM?
Transferring data from GPU to CPU was always a very slow operation.
A GPU -> CPU readback introduces a "sync point" where the CPU must wait for the GPU to complete its calculations. During this time, the CPU stops feeding the GPU with data, causing it to stall.
Now, remember that a modern GPU is designed in a highly parallel manner, with thousand threads in flight at any given moment. The sync point must wait for all those threads to finish processing, before it can readback the result of their calculations. Once the readback is complete, all those threads must restart execution from zero... bad!
Reading back the results asynchronously (after a few frames), allows the GPU continue execution without its threads starving (the stop-and-resume issue outlined above). This improves performance tremendously - the more parallel the GPU, the higher the performance improvement.
Depending on your graphical chip and driver, maybe you get better performances by using PBOs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With