I know that CUDA kernels can be "overlapped" by putting them into separate streams, but I'm wondering if would it be possible to transfer memory during kernel executions. CUDA kernels are asynchronous afterall
You can run kernels, transfers from host to device and transfers from device to host concurrently.
http://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With