Fixed-width Floating-Point Numbers in C/C++

Tags:

int is usually 32 bits, but in the standard, int is not guaranteed to have a constant width. So if we want a 32 bit int we include stdint.h and use int32_t.

Is there an equivalent for this for floats? I realize it's a bit more complicated with floats since they aren't stored in a homogeneous fashion, i.e. sign, exponent, significand. I just want a double that is guaranteed to be stored in 64 bits with 1 sign bit, 10 bit exponent, and 52/53 bit significand (depending on whether you count the hidden bit).

243

asked Aug 26 '09 00:08

Imagist

1 Answers

According to the current C99 draft standard, annex F, that should be double. Of course, this is assuming your compilers meet that part of the standard.

For C++, I've checked the 0x draft and a draft for the 1998 version of the standard, but neither seem to specify anything about representation like that part of the C99 standard, beyond a bool in numeric_limits that specifies that IEEE 754/IEC 559 is used on that platform, like Josh Kelley mentions.

Very few platforms do not support IEEE 754, though - it generally does not pay off to design another floating-point format since IEEE 754 is well-defined and works quite nicely - and if that is supported, then it is a reasonable assumption that double is indeed 64 bits (IEEE 754-1985 calls that format double-precision, after all, so it makes sense).

On the off chance that double isn't double-precision, build in a sanity check so users can report it and you can handle that platform separately. If the platform doesn't support IEEE 754, you're not going to get that representation anyway unless you implement it yourself.

130

answered Sep 28 '22 04:09

Michael Madsen

Related questions
                            
                                What's your favorite g++ option? [closed]
                            
                                Can I use `abstract` keyword in C++ class
                            
                                Why does this output of the same expression from printf differ from cout?
                            
                                How do you make linux GUI's?
                            
                                What advantages can I get from learning C++ if I'm mainly a C# Programmer? [closed]
                            
                                Are do-while-false loops common?
                            
                                What is a good example of recursion other than generating a Fibonacci sequence?
                            
                                Safety of std::unordered_map::merge()
                            
                                Menubar + Commandbar on WM 5.0 and WM 6.5.3
                            
                                Why eigenvector & eigenvalue in LDA become zero?
                            
                                Handle event callbacks with Luabind
                            
                                Boost::Python, converting tuple to Python works, vector<tuple> does not
                            
                                Problems when scaling a YUV image using libyuv library
                            
                                Is there a gcc option to assume all extern "C" functions cannot propagate exceptions?
                            
                                How to indent after access modifiers with clang-format
                            
                                Using std::array and using "array" as name
                            
                                Should the implementation guard itself against comma overloading?
                            
                                How can I link with (or work around) two third-party static libraries that define the same symbols?
                            
                                Is the Visual C++ implementation of std::async using a thread pool legal
                            
                                Xcode refuses to build one of my OpenCL projects but builds another one successfully

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fixed-width Floating-Point Numbers in C/C++

Tags:

c++

c

floating-point

double

Imagist

People also ask

1 Answers

Michael Madsen

Recent Activity

Donate For Us