Fast multiplication/division by 2 for floats and doubles (C/C++)

Tags:

In the software I'm writing, I'm doing millions of multiplication or division by 2 (or powers of 2) of my values. I would really like these values to be int so that I could access the bitshift operators

int a = 1; int b = a<<24

However, I cannot, and I have to stick with doubles.

My question is : as there is a standard representation of doubles (sign, exponent, mantissa), is there a way to play with the exponent to get fast multiplications/divisions by a power of 2?

I can even assume that the number of bits is going to be fixed (the software will work on machines that will always have 64 bits long doubles)

P.S : And yes, the algorithm mostly does these operations only. This is the bottleneck (it's already multithreaded).

Edit : Or am I completely mistaken and clever compilers already optimize things for me?

Temporary results (with Qt to measure time, overkill, but I don't care):

#include <QtCore/QCoreApplication> #include <QtCore/QElapsedTimer> #include <QtCore/QDebug>  #include <iostream> #include <math.h>  using namespace std;  int main(int argc, char *argv[]) { QCoreApplication a(argc, argv);  while(true) {     QElapsedTimer timer;     timer.start();      int n=100000000;     volatile double d=12.4;     volatile double D;     for(unsigned int i=0; i<n; ++i)     {         //D = d*32;      // 200 ms         //D = d*(1<<5);  // 200 ms         D = ldexp (d,5); // 6000 ms     }      qDebug() << "The operation took" << timer.elapsed() << "milliseconds"; }  return a.exec(); }

Runs suggest that D = d*(1<<5); and D = d*32; run in the same time (200 ms) whereas D = ldexp (d,5); is much slower (6000 ms). I know that this is a micro benchmark, and that suddenly, my RAM has exploded because Chrome has suddenly asked to compute Pi in my back every single time I run ldexp(), so this benchmark is worth nothing. But I'll keep it nevertheless.

On the other had, I'm having trouble doing reinterpret_cast<uint64_t *> because there's a const violation (seems the volatile keyword interferes)

384

asked Oct 11 '11 02:10

B. Decoster

2 Answers

This is one of those highly-application specific things. It may help in some cases and not in others. (In the vast majority of cases, a straight-forward multiplication is still best.)

The "intuitive" way of doing this is just to extract the bits into a 64-bit integer and add the shift value directly into the exponent. (this will work as long as you don't hit NAN or INF)

So something like this:

union{     uint64 i;     double f; };  f = 123.; i += 0x0010000000000000ull;  //  Check for zero. And if it matters, denormals as well.

Note that this code is not C-compliant in any way, and is shown just to illustrate the idea. Any attempt to implement this should be done directly in assembly or SSE intrinsics.

However, in most cases the overhead of moving the data from the FP unit to the integer unit (and back) will cost much more than just doing a multiplication outright. This is especially the case for pre-SSE era where the value needs to be stored from the x87 FPU into memory and then read back into the integer registers.

In the SSE era, the Integer SSE and FP SSE use the same ISA registers (though they still have separate register files). According the Agner Fog, there's a 1 to 2 cycle penalty for moving data between the Integer SSE and FP SSE execution units. So the cost is much better than the x87 era, but it's still there.

All-in-all, it will depend on what else you have on your pipeline. But in most cases, multiplying will still be faster. I've run into this exact same problem before so I'm speaking from first-hand experience.

Now with 256-bit AVX instructions that only support FP instructions, there's even less of an incentive to play tricks like this.

167

answered Sep 22 '22 02:09

Mysticial

How about ldexp?

Any half-decent compiler will generate optimal code on your platform.

But as @Clinton points out, simply writing it in the "obvious" way should do just as well. Multiplying and dividing by powers of two is child's play for a modern compiler.

Directly munging the floating point representation, besides being non-portable, will almost certainly be no faster (and might well be slower).

And of course, you should not waste time even thinking about this question unless your profiling tool tells you to. But the kind of people who listen to this advice will never need it, and the ones who need it will never listen.

[update]

OK, so I just tried ldexp with g++ 4.5.2. The cmath header inlines it as a call to __builtin_ldexp, which in turn...

...emits a call to the libm ldexp function. I would have thought this builtin would be trivial to optimize, but I guess the GCC developers never got around to it.

So, multiplying by 1 << p is probably your best bet, as you have discovered.

answered Sep 21 '22 02:09

Nemo

Related questions
                            
                                Intel AVX: 256-bits version of dot product for double precision floating point variables
                            
                                What are the uses of the type `std::nullptr_t`?
                            
                                Do I have to use the OpenGL data types (GLint, CLchar, ...) for a cross platform game?
                            
                                Writing Python bindings for C++ code that use OpenCV
                            
                                Append to a File with fstream Instead of Overwriting
                            
                                What does a>?=b mean?
                            
                                Can I comment multi-line macros?
                            
                                Passing functions in C++
                            
                                What does this mean const int*& var?
                            
                                How to set background color of window after I have registered it?
                            
                                Why is template argument deduction disabled with std::forward?
                            
                                LNK2019: unresolved external symbol _main referenced in function ___tmainCRTStartup
                            
                                avoiding the first newline in a C++11 raw string literal?
                            
                                what happens when i mix signed and unsigned types ?
                            
                                How is nth_element Implemented?
                            
                                Why does C++ allow you to move classes containing objects with deleted move operations?
                            
                                Make integer sequence unique at compile time
                            
                                Is it a good idea to return " const char * " from a function?
                            
                                Why default argument cannot be specified for an explicit template specialization?
                            
                                When do we need to have a default constructor?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fast multiplication/division by 2 for floats and doubles (C/C++)

Tags:

c++

c

optimization

division

multiplication

B. Decoster

People also ask

2 Answers

Mysticial

Nemo

Recent Activity

Donate For Us