Double values store higher precision and are double the size of a float, but are Intel CPUs optimized for floats?
That is, are double operations just as fast or faster than float operations for +, -, *, and /?
Does the answer change for 64-bit architectures?
Floats are faster than doubles when you don't need double's precision and you are memory-bandwidth bound and your hardware doesn't carry a penalty on floats. They conserve memory-bandwidth because they occupy half the space per number.
Double is more precise than float and can store 64 bits, double of the number of bits float can store. Double is more precise and for storing large numbers, we prefer double over float.
double has 2x more precision than float. float is a 32-bit IEEE 754 single precision Floating Point Number – 1 bit for the sign, 8 bits for the exponent, and 23* for the value. float has 7 decimal digits of precision.
Both double-type and float-type can be used to represent floating-point numbers in Java. A double-type is preferred over float-type if the more precise and accurate result is required. The precision of double-type is up to 15 to 16 decimal points while the precision of float type is only around 6 to 7 decimal digits.
There isn't a single "intel CPU", especially in terms of what operations are optimized with respect to others!, but most of them, at CPU level (specifically within the FPU), are such that the answer to your question:
are double operations just as fast or faster than float operations for +, -, *, and /?
is "yes" -- within the CPU, except for division and sqrt which are somewhat slower for double
than for float
. (Assuming your compiler uses SSE2 for scalar FP math, like all x86-64 compilers do, and some 32-bit compilers depending on options. Legacy x87 doesn't have different widths in registers, only in memory (it converts on load/store), so historically even sqrt and division were just as slow for double
).
For example, Haswell has a divsd
throughput of one per 8 to 14 cycles (data-dependent), but a divss
(scalar single) throughput of one per 7 cycles. x87 fdiv
is 8 to 18 cycle throughput. (Numbers from https://agner.org/optimize/. Latency correlates with throughput for division, but is higher than the throughput numbers.)
The float
versions of many library functions like logf(float)
and sinf(float)
will also be faster than log(double)
and sin(double)
, because they have many fewer bits of precision to get right. They can use polynomial approximations with fewer terms to get full precision for float
vs. double
However, taking up twice the memory for each number clearly implies heavier load on the cache(s) and more memory bandwidth to fill and spill those cache lines from/to RAM; the time you care about performance of a floating-point operation is when you're doing a lot of such operations, so the memory and cache considerations are crucial.
@Richard's answer points out that there are also other ways to perform FP operations (the SSE / SSE2 instructions; good old MMX was integers-only), especially suitable for simple ops on lot of data ("SIMD", single instruction / multiple data) where each vector register can pack 4 single-precision floats or only 2 double-precision ones, so this effect will be even more marked.
In the end, you do have to benchmark, but my prediction is that for reasonable (i.e., large;-) benchmarks, you'll find advantage to sticking with single precision (assuming of course that you don't need the extra bits of precision!-).
If all floating-point calculations are performed within the FPU, then, no, there is no difference between a double
calculation and a float
calculation because the floating point operations are actually performed with 80 bits of precision in the FPU stack. Entries of the FPU stack are rounded as appropriate to convert the 80-bit floating point format to the double
or float
floating-point format. Moving sizeof(double)
bytes to/from RAM versus sizeof(float)
bytes is the only difference in speed.
If, however, you have a vectorizable computation, then you can use the SSE extensions to run four float
calculations in the same time as two double
calculations. Therefore, clever use of the SSE instructions and the XMM registers can allow higher throughput on calculations that only use float
s.
Another point to consider is if you are using GPU(the graphics card). I work with a project that is numerically intensive, yet we do not need the percision that double offers. We use GPU cards to help further speed the processing. CUDA GPU's need a special package to support double, and the amount of local RAM on a GPU is quite fast, but quite scarce. As a result, using float also doubles the amount of data we can store on the GPU.
Yet another point is the memory. Floats take half as much RAM as doubles. If you are dealing with VERY large datasets, this can be a really important factor. If using double means you have to cache to disk vs pure ram, your difference will be huge.
So for the application I am working with, the difference is quite important.
I just want to add to the already existing great answers that the __m256?
family of same-instruction-multiple-data (SIMD) C++ intrinsic functions operate on either 4 double
s in parallel (e.g. _mm256_add_pd
), or 8 float
s in parallel (e.g. _mm256_add_ps
).
I'm not sure if this can translate to an actual speed up, but it seems possible to process 2x as many floats per instruction when SIMD is used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With