Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cudaMallocManaged vs cudaMalloc - Device memory limitation scenario

I understand that cudaMallocManaged simplifies memory access by eliminating the need for explicit memory allocations on host and device. Consider a scenario where the host memory is significantly larger than the device memory, say 16 GB host & 2 GB device which is fairly common these days. If I am dealing with input data of large size say 4-5 GB which is read from an external data source. Am I forced to resort to explicit host and device memory allocation (as device memory is insufficient to accommodate at once) or does the CUDA unified memory model has a way to get around this (something like, auto allocate/deallocate on need basis)?

like image 545
mssrivatsa Avatar asked Dec 21 '14 17:12

mssrivatsa


1 Answers

Am I forced to resort to explicit host and device memory allocation?

You are not forced to resort to explicit host and device memory allocation, but you will be forced to handle the amount of allocated memory manually. This is because, on current hardware at least, the CUDA unified virtual memory doesn't allow you to oversubscribe GPU memory. In other words, cudaMallocManaged will fail once you allocate more memory than what is available on the device. But that doesn't mean you can't use cudaMallocManaged, it merely means you have to keep track of the amount of memory allocated and never exceed what the device could support, by "streaming" your data instead of allocating everything at once.

Pure speculation as I can't speak for NVIDIA, but I believe this could be one of the future improvements on upcoming hardware.


And indeed, one year and a half after the above prediction, as of CUDA 8, Pascal GPUs are now enhanced with a page-faulting capability that allows memory pages to migrate between the host and the device without explicit intervention from the programmer.

like image 179
user703016 Avatar answered Nov 11 '22 02:11

user703016