I'm facing a simple problem, where all my calls to cudaMalloc fail, giving me an out of memory error, even if its just a single byte I'm allocating.
The cuda device is available and there is also a lot of memory available (bot checked with the corresponding calls).
Any idea what the problem could be?
Please try to call cudaSetDevice(), then cudaDeviceSynchronize() and then cudaThreadSynchronize() at the beginning of the code itself.
cudaSetDevice(0) if there is only one device. by default the CUDA run time will initialize the device 0.
cudaSetDevice(0);
cudaDeviceSynchronize();
cudaThreadSynchronize();
Please reply back your observation. If still it is getting failed, please specify the OS, architecture, CUDA SDK version, CUDA driver version. if possible please provide the code/code snippet which is being failed.
Thank you everybody for your help.
The problem was not really with the cudaMalloc itself but it shadowed the real problem which was due to the initialisation of cuda which seemed to fail.
Because the first call to cuda was in a separate Thread I did'nt have a GLContext available, leading to failures. I needed to make sure that I initialised cuda by a dummy malloc in the main thread after the initialisation of the context.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With