Floating point vs integer calculations on modern hardware

Tags:

I am doing some performance critical work in C++, and we are currently using integer calculations for problems that are inherently floating point because "its faster". This causes a whole lot of annoying problems and adds a lot of annoying code.

Now, I remember reading about how floating point calculations were so slow approximately circa the 386 days, where I believe (IIRC) that there was an optional co-proccessor. But surely nowadays with exponentially more complex and powerful CPUs it makes no difference in "speed" if doing floating point or integer calculation? Especially since the actual calculation time is tiny compared to something like causing a pipeline stall or fetching something from main memory?

I know the correct answer is to benchmark on the target hardware, what would be a good way to test this? I wrote two tiny C++ programs and compared their run time with "time" on Linux, but the actual run time is too variable (doesn't help I am running on a virtual server). Short of spending my entire day running hundreds of benchmarks, making graphs etc. is there something I can do to get a reasonable test of the relative speed? Any ideas or thoughts? Am I completely wrong?

The programs I used as follows, they are not identical by any means:

#include <iostream> #include <cmath> #include <cstdlib> #include <time.h>  int main( int argc, char** argv ) {     int accum = 0;      srand( time( NULL ) );      for( unsigned int i = 0; i < 100000000; ++i )     {         accum += rand( ) % 365;     }     std::cout << accum << std::endl;      return 0; }

Program 2:

#include <iostream> #include <cmath> #include <cstdlib> #include <time.h>  int main( int argc, char** argv ) {      float accum = 0;     srand( time( NULL ) );      for( unsigned int i = 0; i < 100000000; ++i )     {         accum += (float)( rand( ) % 365 );     }     std::cout << accum << std::endl;      return 0; }

Thanks in advance!

Edit: The platform I care about is regular x86 or x86-64 running on desktop Linux and Windows machines.

Edit 2(pasted from a comment below): We have an extensive code base currently. Really I have come up against the generalization that we "must not use float since integer calculation is faster" - and I am looking for a way (if this is even true) to disprove this generalized assumption. I realize that it would be impossible to predict the exact outcome for us short of doing all the work and profiling it afterwards.

Anyway, thanks for all your excellent answers and help. Feel free to add anything else :).

746

asked Mar 31 '10 03:03

maxpenguin

1 Answers

For example (lesser numbers are faster),

64-bit Intel Xeon X5550 @ 2.67GHz, gcc 4.1.2 -O3

short add/sub: 1.005460 [0] short mul/div: 3.926543 [0] long add/sub: 0.000000 [0] long mul/div: 7.378581 [0] long long add/sub: 0.000000 [0] long long mul/div: 7.378593 [0] float add/sub: 0.993583 [0] float mul/div: 1.821565 [0] double add/sub: 0.993884 [0] double mul/div: 1.988664 [0]

32-bit Dual Core AMD Opteron(tm) Processor 265 @ 1.81GHz, gcc 3.4.6 -O3

short add/sub: 0.553863 [0] short mul/div: 12.509163 [0] long add/sub: 0.556912 [0] long mul/div: 12.748019 [0] long long add/sub: 5.298999 [0] long long mul/div: 20.461186 [0] float add/sub: 2.688253 [0] float mul/div: 4.683886 [0] double add/sub: 2.700834 [0] double mul/div: 4.646755 [0]

As Dan pointed out, even once you normalize for clock frequency (which can be misleading in itself in pipelined designs), results will vary wildly based on CPU architecture (individual ALU/FPU performance, as well as actual number of ALUs/FPUs available per core in superscalar designs which influences how many independent operations can execute in parallel -- the latter factor is not exercised by the code below as all operations below are sequentially dependent.)

Poor man's FPU/ALU operation benchmark:

#include <stdio.h> #ifdef _WIN32 #include <sys/timeb.h> #else #include <sys/time.h> #endif #include <time.h> #include <cstdlib>  double mygettime(void) { # ifdef _WIN32   struct _timeb tb;   _ftime(&tb);   return (double)tb.time + (0.001 * (double)tb.millitm); # else   struct timeval tv;   if(gettimeofday(&tv, 0) < 0) {     perror("oops");   }   return (double)tv.tv_sec + (0.000001 * (double)tv.tv_usec); # endif }  template< typename Type > void my_test(const char* name) {   Type v  = 0;   // Do not use constants or repeating values   //  to avoid loop unroll optimizations.   // All values >0 to avoid division by 0   // Perform ten ops/iteration to reduce   //  impact of ++i below on measurements   Type v0 = (Type)(rand() % 256)/16 + 1;   Type v1 = (Type)(rand() % 256)/16 + 1;   Type v2 = (Type)(rand() % 256)/16 + 1;   Type v3 = (Type)(rand() % 256)/16 + 1;   Type v4 = (Type)(rand() % 256)/16 + 1;   Type v5 = (Type)(rand() % 256)/16 + 1;   Type v6 = (Type)(rand() % 256)/16 + 1;   Type v7 = (Type)(rand() % 256)/16 + 1;   Type v8 = (Type)(rand() % 256)/16 + 1;   Type v9 = (Type)(rand() % 256)/16 + 1;    double t1 = mygettime();   for (size_t i = 0; i < 100000000; ++i) {     v += v0;     v -= v1;     v += v2;     v -= v3;     v += v4;     v -= v5;     v += v6;     v -= v7;     v += v8;     v -= v9;   }   // Pretend we make use of v so compiler doesn't optimize out   //  the loop completely   printf("%s add/sub: %f [%d]\n", name, mygettime() - t1, (int)v&1);   t1 = mygettime();   for (size_t i = 0; i < 100000000; ++i) {     v /= v0;     v *= v1;     v /= v2;     v *= v3;     v /= v4;     v *= v5;     v /= v6;     v *= v7;     v /= v8;     v *= v9;   }   // Pretend we make use of v so compiler doesn't optimize out   //  the loop completely   printf("%s mul/div: %f [%d]\n", name, mygettime() - t1, (int)v&1); }  int main() {   my_test< short >("short");   my_test< long >("long");   my_test< long long >("long long");   my_test< float >("float");   my_test< double >("double");    return 0; }

126

answered Sep 23 '22 06:09

vladr

Related questions
                            
                                In which order should floats be added to get the most precise result?
                            
                                QString to char* conversion
                            
                                Is there a function to copy an array in C/C++?
                            
                                How to write log base(2) in c/c++
                            
                                Malloc vs new -- different padding
                            
                                Why is there no std::stou?
                            
                                Implementing comparison operators via 'tuple' and 'tie', a good idea?
                            
                                Initializing a member array in constructor initializer
                            
                                Using continue in a switch statement
                            
                                Why explicitly delete the constructor instead of making it private?
                            
                                Why not infer template parameter from constructor?
                            
                                initializer_list and move semantics
                            
                                Passing shared_ptr<Derived> as shared_ptr<Base>
                            
                                How to return smart pointers (shared_ptr), by reference or by value?
                            
                                std::back_inserter for a std::set?
                            
                                Run an Application in GDB Until an Exception Occurs
                            
                                What is difference between instantiating an object using new vs. without
                            
                                Why does the C++ map type argument require an empty constructor when using []?
                            
                                Using pre-compiled headers with CMake
                            
                                Why is it OK to return a 'vector' from a function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Floating point vs integer calculations on modern hardware

Tags:

c++

floating-point

x86

x86-64

maxpenguin

People also ask

1 Answers

vladr

Recent Activity

Donate For Us