Consider the following code snippet in C++ :(visual studio 2015)
First Block
const int size = 500000000;
int sum =0;
int *num1 = new int[size];//initialized between 1-250
int *num2 = new int[size];//initialized between 1-250
for (int i = 0; i < size; i++)
{
sum +=(num1[i] / num2[i]);
}
Second Block
const int size = 500000000;
int sum =0;
float *num1 = new float [size]; //initialized between 1-250
float *num2 = new float [size]; //initialized between 1-250
for (int i = 0; i < size; i++)
{
sum +=(num1[i] / num2[i]);
}
I expected that first block runs faster because it is integer operation . But the Second block is considerably faster , although it is floating point operation . here is results of my bench mark : Division:
Type Time
uint8 879.5ms
uint16 885.284ms
int 982.195ms
float 654.654ms
As well as floating point multiplication is faster than integer multiplication. here is results of my bench mark :
Multiplication:
Type Time
uint8 166.339ms
uint16 524.045ms
int 432.041ms
float 402.109ms
My system spec: CPU core i7-7700 ,Ram 64GB,Visual studio 2015
Often floating point multiply is faster than integer multiply (because floating point multiply is used more often so the CPU designers spend more effort optimising that path). Floating point divide may well be slow (often more than 10 cycles) , but then so is integer divide.
In theory, the fastest possible operation takes one CPU cycle, to it boils down to saying that multiplying floating point numbers is every bit as fast as adding integers. The complexity of the algorithm hasn't gone away, so this is absolutely stunning.
In computer systems, integer arithmetic is exact, but the possible range of values is limited. Integer arithmetic is generally faster than floating-point arithmetic. Floating-point numbers represent what were called in school “real” numbers (i.e., those that have a fractional part, such as 3.1415927).
Integers and floats are two different kinds of numerical data. An integer (more commonly called an int) is a number without a decimal point. A float is a floating-point number, which means it is a number that has a decimal place. Floats are used when more precision is needed.
Floating point number division is faster than integer division because of the exponent part in floating point number representation. To divide one exponent by another one plain subtraction is used.
int32_t
division requires fast division of 31-bit numbers, whereas float
division requires fast division of 24-bit mantissas (the leading one in mantissa is implied and not stored in a floating point number) and faster subtraction of 8-bit exponents.
See an excellent detailed explanation how division is performed in CPU.
It may be worth mentioning that SSE and AVX instructions only provide floating point division, but no integer division. SSE instructions/intrinsincs can be used to quadruple the speed of your float
calculation easily.
If you look into Agner Fog's instruction tables, for example, for Skylake, the latency of the 32-bit integer division is 26 CPU cycles, whereas the latency of the SSE scalar float division is 11 CPU cycles (and, surprisingly, it takes the same time to divide four packed floats).
Also note, in C and C++ there is no division on numbers shorter that int
, so that uint8_t
and uint16_t
are first promoted to int
and then the division of int
s happens. uint8_t
division looks faster than int
because it has fewer bits set when converted to int
which causes the division to complete faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With