How to force pow(float, int) to return float

Tags:

The overloaded function float pow(float base, int iexp ) was removed in C++11 and now pow returns a double. In my program, I am computing lots of these (in single precision) and I am interested in the most efficient way how to do it.

Is there some special function (in standard libraries or any other) with the above signature?

If not, is it better (in terms of performance in single precision) to explicitly cast result of pow into float before any other operations (which would cast everything else into double) or cast iexp into float and use overloaded function float pow(float base, float exp)?

EDIT: Why I need float and do not use double?

The primarily reason is RAM -- I need tens or hundreds of GB so this reduction is huge advantage. So I need from float to get float. And now I need the most efficient way to achieve that (less casts, use already optimize algorithms, etc).

840

asked Jan 16 '18 12:01

Michal

1 Answers

Another question that can only be honestly answered with "wrong question". Or at least: "Are you really willing to go there?". float theoretically needs ca. 80% less die space (for the same number of cycles) and so can be much cheaper for bulk processing. GPUs love float for this reason.

However, let's look at x86 (admittedly, you didn't say what architecture you're on, so I picked the most common). The price in die space has already been paid. You literally gain nothing by using float for calculations. Actually, you may even lose throughput because additional extensions from float to double are required, and additional rounding to intermediate float precision. In other words, you pay extra to have a less accurate result. This is typically something to avoid except maybe when you need maximum compatibility with some other program.

See Jens' comment as well. These options give the compiler permission to disregard some language rules to achieve higher performance. Needless to say this can sometimes backfire.

There are two scenarios where float might be more efficient, on x86:

GPU (including GPGPU), in fact many GPUs don't even support double and if they do, it's usually much slower. Yet, you will only notice when doing very many calculations of this sort.
CPU SIMD aka vectorization

You'd know if you did GPGPU. Explicit vectorization by using compiler intrinsics is also a choice – one you could make, for sure, but this requires quite a cost-benefit analysis. Possibly your compiler is able to auto-vectorize some loops, but this is usually limited to "obvious" applications, such as where you multiply each number in a vector<float> by another float, and this case is not so obvious IMO. Even if you pow each number in such a vector by the same int, the compiler may not be smart enough to vectorize this effectively, especially if pow resides in another translation unit, and without effective link time code generation.

If you are not ready to consider changing the whole structure of your program to allow effective use of SIMD (including GPGPU), and you're not on an architecture where float is indeed much cheaper by default, I suggest you stick with double by all means, and consider float at best a storage format that may be useful to conserve RAM, or to improve cache locality (when you have a lot of them). Even then, measuring is an excellent idea.

That said, you could try ivaigult's algorithm (only with double for the intermediate and for the result), which is related to a classical algorithm called Egyptian multiplication (and a variety of other names), only that the operands are multiplied and not added. I don't know how pow(double, double) works exactly, but it is conceivable that this algorithm could be faster in some cases. Again, you should be OCD about benchmarking.

179

answered Oct 06 '22 01:10

Arne Vogel

Related questions
                            
                                Why there is no std::move_if_noexcept counterpart for std::forward in C++11/14?
                            
                                JNI ERROR (app bug): local reference table overflow (max=512)
                            
                                LLVM JIT: pass C++ exception through JIT code back to host application
                            
                                Android NDK chrono epoch is not correct (std::chrono::high_resolution_clock)
                            
                                Static initialization of local variables
                            
                                try\catch block in the main() function without brackets
                            
                                Undefined reference to static constexpr
                            
                                Is an empty class usable as a constexpr variable without an initializer or explicit default constructor?
                            
                                Is it allowed to use decltype on std::declval<T> (the function itself, not the result of calling it)?
                            
                                Cast between structs from different scopes
                            
                                Clang and GCC disagree on legality of direct initialization with conversion operator
                            
                                Is it possible to parameterize the constness of a templated member function?
                            
                                clarification of specifics of P0137
                            
                                C++ shared_mutex implementation
                            
                                Parameter of returned generic lambda allegedly shadows parameter of free function
                            
                                Change default C++ standard in g++
                            
                                Blocking on many locks/futures/etc. until any is ready
                            
                                How to check if every type in a parameter pack is unique? [duplicate]
                            
                                Is it possible that a store with memory_order_relaxed never reaches other threads?
                            
                                Extending a type in C++

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to force pow(float, int) to return float

Tags:

c++

c++11

pow

Michal

People also ask

1 Answers

Arne Vogel

Recent Activity

Donate For Us