Is fast implementation of pow(x, 0.5f) faster than fast sqrt(x)?

Tags:

I'm wondering if fast implementation of pow(), for example this one, is a faster way to get square root of an integer than fast sqrt(x). We know that

sqrt(x) = pow(x, 0.5f)

I cannot test speed myself because I did not find fast implementation of sqrt. My question is: Is fast implementation of pow(x, 0.5f) faster than fast sqrt(x) ?

Edit: I meant powf - pow that takes floats intead of doubles. (doubles are more misleading)

200

asked Aug 04 '12 17:08

Zaffy

1 Answers

With regard to C standard library sqrt and pow, the answer is no.

First, if pow(x, .5f) were faster than an implementation of sqrt(x), the engineer assigned to maintain sqrt would replace the implementation with pow(x, .5f).

Second, implementations of sqrt in commercial libraries are typically optimized specifically to perform that task, often by people who are knowledgeable about writing high-performance software and who write in or near assembly language to get the best performance available from the processor.

Third, many processors have instructions to perform sqrt or to assist in calculating it. (Commonly, there is an instruction to provide an estimate of the reciprocal of the square root and an instruction to refine that estimate.)

However

The code you linked/question you asked is about attempting a crude approximation of sqrt using a crudely approximated pow.

I converted the final version of the pow approximation routine referred to in the question to C and measured the run time of it when computing pow(3, .5). I also measured the run-time of the system (Mac OS X 10.8) pow and sqrt and of the sqrt approximation here (with one iteration and multiplying by the argument at the end to get the square root, rather than its inverse).

First, the computed results: The pow approximation returns 1.72101. The sqrt approximation returns 1.73054. The correct value, returned by the system pow and sqrt, is 1.73205.

Running in 64-bit mode on a MacPro4,1, the pow approximation takes about 6 cycles, the system pow takes 29 cycles, the square root approximation takes 10 cycles, and the system sqrt takes 29 cycles. These times may include some overhead for loading arguments and storing results (I used volatile variables to force the compiler not to optimize away otherwise useless loop iterations, so that I could measure them).

(These times are “effective throughput”, in effect the number of CPU cycles from when one call begins to when another can begin.)

169

answered Nov 15 '22 14:11

Eric Postpischil

Related questions
                            
                                Is there a function to convert EXCEPTION_POINTERS struct to a string?
                            
                                C++ assignment operator - compiler generated or custom?
                            
                                Simple makefile generation utility?
                            
                                Assignment of data-member in read-only structure, class in STL set
                            
                                Why is initialization of integer member variable (which is not const static) not allowed in C++?
                            
                                How do I call static members of a template class?
                            
                                Advantages and disadvantages of Open Watcom [closed]
                            
                                C++ - Private variables in classes
                            
                                Should I use the initializer list or perform assignments in my C++ constructors?
                            
                                Qt checkbox/toolbutton with custom/distinct check/unchecked icons
                            
                                Using QSocketNotifier to select on a char device.
                            
                                Difference between a struct and a class [duplicate]
                            
                                How do you cope with signed char -> int issues with standard library?
                            
                                C++ vector accumulates
                            
                                Does passing by reference always avoid the slicing issue?
                            
                                writing into binary files
                            
                                Delete on already deleted object : behavior?
                            
                                Why can't we use const members in static member functions?
                            
                                Why comment parameter names rather than leave it as it is
                            
                                How to use shell magic to create a recursive etags using GNU etags?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is fast implementation of pow(x, 0.5f) faster than fast sqrt(x)?

Tags:

c++

performance

c

math

Zaffy

People also ask

1 Answers

However

Eric Postpischil

Recent Activity

Donate For Us