Is using float
type slower than using double
type?
I heard that modern Intel and AMD CPUs can do calculations with doubles faster than with floats.
What about standard math functions (sqrt
, pow
, log
, sin
, cos
, etc.)? Computing them in single-precision should be considerably faster because it should require less floating-point operations. For example, single precision sqrt
can use simpler math formula than double precision sqrt
. Also, I heard that standard math functions are faster in 64 bit mode (when compiled and run on 64 bit OS). What is the definitive answer on this?
The difference in performance between 32-bit and 64-bit versions of applications depends greatly upon their types, and the data types they are processing. But in general you may expect a 2-20% performance gain from mere recompilation of a program - this is explained by architectural changes in 64-bit processors [1].
Floats are faster than doubles when you don't need double's precision and you are memory-bandwidth bound and your hardware doesn't carry a penalty on floats. They conserve memory-bandwidth because they occupy half the space per number. There are also platforms that can process more floats than doubles in parallel.
They are slightly larger, yes, but they are not faster. The only difference is that a 64-bit program uses 64 bits long memory addresses, which means that they take up a little more space (because the addresses take more space to store/hold), and that they can use more than about 4GB of RAM.
Avoiding denormal values in C++ Since double have a much wider normal range, for a specific problem that contains many small values, There is much higher probability to fall into denormal range with float than with double, so float could be much slower than double in this case. Save this answer.
Floats generally come in two flavours: “single” and “double” precision. Single precision floats are 32-bits in length while “doubles” are 64-bits. Due to the finite size of floats, they cannot represent all of the real numbers - there are limitations on both their precision and range.
For multitasking and stress testing, the 64-bit processor is better. It also works well for the execution of other heavy applications. The 32-bit applications and operating systems require 32-bit CPUs. The 64-bit operating system needs a 64-bit CPU, and the 64-bit applications require a 64-bit CPU and OS.
The classic x86 architecture uses floating-point unit (FPU) to perform floating-point calculations. The FPU performs all calculations in its internal registers, which have 80-bit precision each. Every time you attempt to work with float
or double
, the variable is first loaded from memory into the internal register of the FPU. This means that there is absolutely no difference in the speed of the actual calculations, since in any case the calculations are carried out with full 80-bit precision. The only thing that might be different is the speed of loading the value from memory and storing the result back to memory. Naturally, on a 32-bit platform it might take longer to load/store a double
as compared to float
. On a 64-bit platform there shouldn't be any difference.
Modern x86 architectures support extended instruction sets (SSE/SSE2) with new instructions that can perform the very same floating-point calculations without involving the "old" FPU instructions. However, again, I wouldn't expect to see any difference in calculation speed for float
and double
. And since these modern platforms are 64-bit ones, the load/store speed is supposed to be the same as well.
On a different hardware platform the situation could be different. But normally a smaller floating-point type should not provide any performance benefits. The main purpose of smaller floating-point types is to save memory, not to improve performance.
Edit: (To address @MSalters comment)
What I said above applies to fundamental arithmetical operations. When it comes to library functions, the answer will depend on several implementation details. If the platform's floating-point instruction set contains an instruction that implements the functionality of the given library function, then what I said above will normally apply to that function as well (that would normally include functions like sin
, cos
, sqrt
). For other functions, whose functionality is not immediately supported in the FP instruction set, the situation might prove to be significantly different. It is quite possible that float
versions of such functions can be implemented more efficiently than their double
versions.
Your first question has already been answer here on SO.
Your second question is entirely dependent on the "size" of the data you are working with. It all boils down to the low level architecture of the system and how it handles large values. 64-bits of data in a 32 bit system would require 2 cycles to access 2 registers. The same data on a 64 bit system should only take 1 cycle to access 1 register.
Everything always depends on what you're doing. I find there are no fast and hard rules so you need to analyze the current task and choose what works best for your needs for that specific task.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With