I am wondering if there is a CUDA equivalent of the alloca function.
I need to create arrays of floats which act as the arguments to the mathematical function I am trying to optimize. The issue is that I don't really want to have to know the number of arguments at compile time which is what I am doing now with templates. I could use the new operator in CUDA but I feel that it is slow (maybe I could preallocate it or something). I would use shared memory but it is not big enough.
There is nothing I am aware of which works like alloca for CUDA. The stack frame in the CUDA ABI is statically allocated by the assembler at compile time anyway, so I doubt there would be any scope for dynamically allocating memory on the stack frame
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With