Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the rules governing C++ single and double precision mixed calculations?

For example, these variables:

result (double)
a (double)
b (float)
c (float)
d (double)

A simple calculation:

result = a * (b + c) * d

How and when are the types converted and how do I figure out what precision each calculation is performed at?

like image 792
Dan Avatar asked Nov 21 '10 19:11

Dan


People also ask

What is single-precision and double precision in C?

The simplest way to distinguish between single- and double-precision computing is to look at how many bits represent the floating-point number. For single precision, 32 bits are used to represent the floating-point number. For double precision, 64 bits are used to represent the floating-point number.

What does double precision mean in C?

Double-precision floating-point format (sometimes called FP64 or float64) is a computer number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.

What is mixed precision computing?

Mixed-precision, also known as transprecision, computing instead uses different precision levels within a single operation to achieve computational efficiency without sacrificing accuracy. In mixed precision, calculations start with half-precision values for rapid matrix math.

How accurate is double in C?

Master C and Embedded C Programming- Learn as you go In terms of number of precision it can be stated as double has 64 bit precision for floating point number (1 bit for the sign, 11 bits for the exponent, and 52* bits for the value), i.e. double has 15 decimal digits of precision.


2 Answers

All operations are done on objects of the same type (assuming normal arithmetic operations).

If you write a program that uses different types then the compiler will auto upgrade ONE parameter so that they are both the same.

In this situations floats will be upgraded to doubles:

result      = a * (b + c) * d

float  tmp1 = b + c;            // Plus operation done on floats.
                                // So the result is a float

double tmp2 = a * (double)tmp1; // Multiplication done on double (as `a` is double)
                                // so tmp1 will be up converted to a double.

double tmp3 = tmp2 * d;         // Multiplication done on doubles.
                                // So result is a double

result      = tmp3;             // No conversion as tmp3 is same type as result.
like image 190
Martin York Avatar answered Sep 29 '22 12:09

Martin York


If you have:

float f;
double d;

...then an arithmetic expression like f * d will promote both operands to the larger type, which in this case is double.

So, the expression a * (b + c) * d evaluates to a double, and is then stored in result, which is also a double. This type promotion is done in order to avoid accidental precision loss.

For further information, read this article about the usual arithmetic conversions.

like image 34
Charles Salvia Avatar answered Sep 29 '22 14:09

Charles Salvia