The overloaded function float pow(float base, int iexp )
was removed in C++11 and now pow
returns a double
. In my program, I am computing lots of these (in single precision) and I am interested in the most efficient way how to do it.
Is there some special function (in standard libraries or any other) with the above signature?
If not, is it better (in terms of performance in single precision) to explicitly cast result of pow
into float
before any other operations (which would cast everything else into double
) or cast iexp
into float
and use overloaded function float pow(float base, float exp)
?
EDIT: Why I need float
and do not use double
?
The primarily reason is RAM -- I need tens or hundreds of GB so this reduction is huge advantage. So I need from float
to get float
. And now I need the most efficient way to achieve that (less casts, use already optimize algorithms, etc).
A float is not an int, and the return value of the method is an int. So no you can't return a float. The function must have a float return type. Like: float main(){ return 0.0; } The main method can't be a float though.
The pow(), powf(), and powl() functions calculate the value of x to the power of y. Note: These functions work in both IEEE Binary Floating-Point and hexadecimal floating-point formats.
To return a float out of a function with long return type, the float value must be converted to long. When converting a float value to long, the decimal value of float will be truncated off, leading to a loss in the value of float, which is allowed by C++ compiler.
pow() is function to get the power of a number, but we have to use #include<math.h> in c/c++ to use that pow() function. then two numbers are passed. Example – pow(4 , 2); Then we will get the result as 4^2, which is 16.
Another question that can only be honestly answered with "wrong question". Or at least: "Are you really willing to go there?". float
theoretically needs ca. 80% less die space (for the same number of cycles) and so can be much cheaper for bulk processing. GPUs love float
for this reason.
However, let's look at x86 (admittedly, you didn't say what architecture you're on, so I picked the most common). The price in die space has already been paid. You literally gain nothing by using float
for calculations. Actually, you may even lose throughput because additional extensions from float
to double
are required, and additional rounding to intermediate float
precision. In other words, you pay extra to have a less accurate result. This is typically something to avoid except maybe when you need maximum compatibility with some other program.
See Jens' comment as well. These options give the compiler permission to disregard some language rules to achieve higher performance. Needless to say this can sometimes backfire.
There are two scenarios where float
might be more efficient, on x86:
double
and if they do, it's usually much slower. Yet, you will only notice when doing very many calculations of this sort.You'd know if you did GPGPU. Explicit vectorization by using compiler intrinsics is also a choice – one you could make, for sure, but this requires quite a cost-benefit analysis. Possibly your compiler is able to auto-vectorize some loops, but this is usually limited to "obvious" applications, such as where you multiply each number in a vector<float>
by another float
, and this case is not so obvious IMO. Even if you pow
each number in such a vector by the same int
, the compiler may not be smart enough to vectorize this effectively, especially if pow
resides in another translation unit, and without effective link time code generation.
If you are not ready to consider changing the whole structure of your program to allow effective use of SIMD (including GPGPU), and you're not on an architecture where float
is indeed much cheaper by default, I suggest you stick with double
by all means, and consider float
at best a storage format that may be useful to conserve RAM, or to improve cache locality (when you have a lot of them). Even then, measuring is an excellent idea.
That said, you could try ivaigult's algorithm (only with double
for the intermediate and for the result), which is related to a classical algorithm called Egyptian multiplication (and a variety of other names), only that the operands are multiplied and not added. I don't know how pow(double, double)
works exactly, but it is conceivable that this algorithm could be faster in some cases. Again, you should be OCD about benchmarking.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With