I am using Cuda SDK 4.0 and am encountering an issue which has taken me 2 days to whittle down into the following code.
#include <cuda.h>
#include <cuda_runtime.h>
void main (int argc, char ** argv) {
int* test;
cudaError_t err;
err = cudaSetDevice( 1 ); err = cudaMallocHost(&test, 1024*sizeof(int));
err = cudaSetDevice( 0 ); err = cudaFreeHost(test);
}
This throws the following error when calling cudaFreeHost:
First-chance exception at 0x000007fefd96aa7d in Test.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0022f958..
The err value is cudaErrorInvalidValue
The same error occurs for this variation:
err = cudaSetDevice( 0 ); err = cudaMallocHost(&test, 1024*sizeof(int));
err = cudaSetDevice( 1 ); err = cudaFreeHost(test);
The following variations dont throw the error:
err = cudaSetDevice( 0 ); err = cudaMallocHost(&test, 1024*sizeof(int));
err = cudaSetDevice( 0 ); err = cudaFreeHost(test);
and
err = cudaSetDevice( 1 ); err = cudaMallocHost(&test, 1024*sizeof(int));
err = cudaSetDevice( 1 ); err = cudaFreeHost(test);
I was under the impression you only needed to call cudaSetDevice if you want to allocate memory on a specific GPU. In the above example I am only allocating pinned memory on the CPU.
Is this a bug or did I miss something in the manual?
Memory management on a CUDA device is similar to how it is done in CPU programming. You need to allocate memory space on the host, transfer the data to the device using the built-in API, retrieve the data (transfer the data back to the host), and finally free the allocated memory.
cudaMalloc is a function that can be called from the host or the device to allocate memory on the device, much like malloc for the host. The memory allocated with cudaMalloc must be freed with cudaFree.
cudaMallocHost: Allocates page-locked memory on the host in duncantl/RCUDA: R Bindings for the CUDA Library for GPU Computing.
The CUDA in-kernel malloc() function allocates at least size bytes from the device heap and returns a pointer to the allocated memory or NULL if insufficient memory exists to fulfill the request.
I found the problem. cudaHostAlloc and cudaMallocHost ARE NOT THE SAME.
For anyone who encounters this problem the solution is to use
cudaHostAlloc(&test, 1024*sizeof(int),cudaHostAllocPortable);
instead of
cudaMallocHost(&test, 1024*sizeof(int));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With