<p>How to optimize the exp function in cuda? What are the differences between the following in CUDA??</p> <pre class="prettyprint"><code> exp() expf() __expf() </code></pre>

<p>The differences are explained in the CUDA C Programming Guide, appendix D.</p> <ul> <li> <code>exp()</code> should be used for double precision, although should be overloaded for single</li> <li> <code>expf()</code> should be used for single precision (<code>float</code>)</li> <li> <code>__expf()</code> is the fast-math version, the performance is faster with some loss of precision (dependent on the input value, see the guide for more details).</li> </ul>

CUDA exp() expf() and __expf()

Tags:

cuda

How to optimize the exp function in cuda? What are the differences between the following in CUDA??

   exp()
   expf()
   __expf()

225

asked Aug 31 '11 13:08

user570593

Video Answer

2 Answers

The differences are explained in the CUDA C Programming Guide, appendix D.

exp() should be used for double precision, although should be overloaded for single
expf() should be used for single precision (float)
__expf() is the fast-math version, the performance is faster with some loss of precision (dependent on the input value, see the guide for more details).

answered Oct 01 '22 10:10

Tom

Generally exp() is for doubles, expf() for floats and both are slightly slower than __exp() which is available as a hardware operation. The performance gain usually comes at the cost of accuracy but unless you are really concerned about accuracy it shouldn't be a problem.

answered Oct 01 '22 12:10

Dan

Related questions
                            
                                Why transposing a CUDA grid (but not its threadblocks) still slowdowns computation?
                            
                                Calculate eigenvalues/eigenvectors of hundreds of small matrices using CUDA
                            
                                How can I use 100% of VRAM on a secondary GPU from a single process on windows 10?
                            
                                What is the best algorithm for this array-comparison problem?
                            
                                __forceinline__ effect at CUDA C __device__ functions
                            
                                Compile cuda code for CPU
                            
                                Simple CUBLAS Matrix Multiplication Example?
                            
                                CUDA small kernel 2d convolution - how to do it
                            
                                Branch and predicated instructions
                            
                                What does "persistence mode" actually do which reduces CUDA startup time?
                            
                                How to separate CUDA code into multiple files
                            
                                Why is the constant memory size limited in CUDA?
                            
                                Get GPU memory usage programmatically
                            
                                Problems when running nvcc from command line
                            
                                Matrix multiplication on CPU (numpy) and GPU (gnumpy) give different results
                            
                                How is 2D Shared Memory arranged in CUDA
                            
                                CUDA allocate memory in __device__ function
                            
                                How to run CUDA without a GPU using a software implementation?
                            
                                How to Run a cuda code using remote Desktop?
                            
                                CUDA version X complains about not supporting gcc version Y - what to do?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With