For example, these variables:
result (double)
a (double)
b (float)
c (float)
d (double)
A simple calculation:
result = a * (b + c) * d
How and when are the types converted and how do I figure out what precision each calculation is performed at?
The simplest way to distinguish between single- and double-precision computing is to look at how many bits represent the floating-point number. For single precision, 32 bits are used to represent the floating-point number. For double precision, 64 bits are used to represent the floating-point number.
Double-precision floating-point format (sometimes called FP64 or float64) is a computer number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
Mixed-precision, also known as transprecision, computing instead uses different precision levels within a single operation to achieve computational efficiency without sacrificing accuracy. In mixed precision, calculations start with half-precision values for rapid matrix math.
Master C and Embedded C Programming- Learn as you go In terms of number of precision it can be stated as double has 64 bit precision for floating point number (1 bit for the sign, 11 bits for the exponent, and 52* bits for the value), i.e. double has 15 decimal digits of precision.
All operations are done on objects of the same type (assuming normal arithmetic operations).
If you write a program that uses different types then the compiler will auto upgrade ONE parameter so that they are both the same.
In this situations floats will be upgraded to doubles:
result = a * (b + c) * d
float tmp1 = b + c; // Plus operation done on floats.
// So the result is a float
double tmp2 = a * (double)tmp1; // Multiplication done on double (as `a` is double)
// so tmp1 will be up converted to a double.
double tmp3 = tmp2 * d; // Multiplication done on doubles.
// So result is a double
result = tmp3; // No conversion as tmp3 is same type as result.
If you have:
float f;
double d;
...then an arithmetic expression like f * d
will promote both operands to the larger type, which in this case is double
.
So, the expression a * (b + c) * d
evaluates to a double
, and is then stored in result
, which is also a double
. This type promotion is done in order to avoid accidental precision loss.
For further information, read this article about the usual arithmetic conversions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With