Why float division is faster than integer division in c++?

Tags:

Consider the following code snippet in C++ :(visual studio 2015)

First Block

const int size = 500000000;
int sum =0;
int *num1 = new int[size];//initialized between 1-250
int *num2 = new int[size];//initialized between 1-250
for (int i = 0; i < size; i++)
{
    sum +=(num1[i] / num2[i]);
}

Second Block

const int size = 500000000;
int sum =0;
float *num1 = new float [size]; //initialized between 1-250 
float *num2 = new float [size]; //initialized between 1-250
for (int i = 0; i < size; i++)
{
    sum +=(num1[i] / num2[i]);
}

I expected that first block runs faster because it is integer operation . But the Second block is considerably faster , although it is floating point operation . here is results of my bench mark : Division:

Type    Time
uint8   879.5ms
uint16  885.284ms
int     982.195ms
float   654.654ms

As well as floating point multiplication is faster than integer multiplication. here is results of my bench mark :

Multiplication:

Type    Time
uint8   166.339ms
uint16  524.045ms
int     432.041ms
float   402.109ms

My system spec: CPU core i7-7700 ,Ram 64GB,Visual studio 2015

791

asked Apr 24 '19 14:04

Mohsen Ghahremani Manesh

Video Answer

1 Answers

Floating point number division is faster than integer division because of the exponent part in floating point number representation. To divide one exponent by another one plain subtraction is used.

int32_t division requires fast division of 31-bit numbers, whereas float division requires fast division of 24-bit mantissas (the leading one in mantissa is implied and not stored in a floating point number) and faster subtraction of 8-bit exponents.

See an excellent detailed explanation how division is performed in CPU.

It may be worth mentioning that SSE and AVX instructions only provide floating point division, but no integer division. SSE instructions/intrinsincs can be used to quadruple the speed of your float calculation easily.

If you look into Agner Fog's instruction tables, for example, for Skylake, the latency of the 32-bit integer division is 26 CPU cycles, whereas the latency of the SSE scalar float division is 11 CPU cycles (and, surprisingly, it takes the same time to divide four packed floats).

Also note, in C and C++ there is no division on numbers shorter that int, so that uint8_t and uint16_t are first promoted to int and then the division of ints happens. uint8_t division looks faster than int because it has fewer bits set when converted to int which causes the division to complete faster.

answered Sep 29 '22 21:09

Maxim Egorushkin

Related questions
                            
                                When does a database table get large enough that an index is beneficial?
                            
                                Writing Efficient CSS
                            
                                What Is The Best Python Zip Module To Handle Large Files?
                            
                                Efficiency of Bitwise XOR in c++ in comparison to more readable methods
                            
                                Mysql medium int vs. int performance
                            
                                C++ Adding 2 arrays together quickly
                            
                                Simple round robin (moving average) array in C#
                            
                                Python: variable-length tuples
                            
                                Lock on Dictionary's TryGetValue() - Performance issues
                            
                                How to improve performance of update() and save() in MongoDB?
                            
                                What runs faster in Ruby: defining the alias method or using alias_method?
                            
                                LINQ query is slow
                            
                                C++ std::map or std::set - efficiently insert duplicates
                            
                                Using __slots__ under PyPy
                            
                                Lambda vs anonymous inner class performance: reducing the load on the ClassLoader?
                            
                                Why does Grails recommend singleton scope for controllers with actions as methods?
                            
                                Python/numpy: Most efficient way to sum n elements of an array, so that each output element is the sum of the previous n input elements?
                            
                                WebStorm uses 100% CPU
                            
                                Implementation of (^)
                            
                                Calculate percentile for every value in a column of dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why float division is faster than integer division in c++?

Tags:

performance

int

floating-point

c++11

division

Mohsen Ghahremani Manesh

People also ask

Video Answer

1 Answers

Maxim Egorushkin

Recent Activity

Donate For Us