Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fixed-width Floating-Point Numbers in C/C++

int is usually 32 bits, but in the standard, int is not guaranteed to have a constant width. So if we want a 32 bit int we include stdint.h and use int32_t.

Is there an equivalent for this for floats? I realize it's a bit more complicated with floats since they aren't stored in a homogeneous fashion, i.e. sign, exponent, significand. I just want a double that is guaranteed to be stored in 64 bits with 1 sign bit, 10 bit exponent, and 52/53 bit significand (depending on whether you count the hidden bit).

like image 243
Imagist Avatar asked Aug 26 '09 00:08

Imagist


People also ask

How are floating point numbers stored in C?

Scalars of type float are stored using four bytes (32-bits). The format used follows the IEEE-754 standard. The mantissa represents the actual binary digits of the floating-point number.

How many floating point numbers are there in C?

The three floating point types differ in how much space they use (32, 64, or 80 bits on x86 CPUs; possibly different amounts on other machines), and thus how much precision they provide.

How do you write a floating point in C?

For example: float age = 10.5, load = 1.4; In this example, two variables called age and load would be defined as float and be assigned the values 10.5 and 1.4, respectively. This C program would print "TechOnTheNet.com is over 10.500000 years old and pages load in 1.400000 seconds."


1 Answers

According to the current C99 draft standard, annex F, that should be double. Of course, this is assuming your compilers meet that part of the standard.

For C++, I've checked the 0x draft and a draft for the 1998 version of the standard, but neither seem to specify anything about representation like that part of the C99 standard, beyond a bool in numeric_limits that specifies that IEEE 754/IEC 559 is used on that platform, like Josh Kelley mentions.

Very few platforms do not support IEEE 754, though - it generally does not pay off to design another floating-point format since IEEE 754 is well-defined and works quite nicely - and if that is supported, then it is a reasonable assumption that double is indeed 64 bits (IEEE 754-1985 calls that format double-precision, after all, so it makes sense).

On the off chance that double isn't double-precision, build in a sanity check so users can report it and you can handle that platform separately. If the platform doesn't support IEEE 754, you're not going to get that representation anyway unless you implement it yourself.

like image 130
Michael Madsen Avatar answered Sep 28 '22 04:09

Michael Madsen