Create denormalized (subnormal) floating point values in C++

Question

For testing purposes I need to create one positive and one negative denormalized (aka subnormal) float and double number in C++.

How can I write/produce such a number?

Dietmar Kühl · Accepted Answer

You could use -std::numeric_limits<T>::denorm_min() and std::numeric_limits<T>::denorm_min(). It is just incidental that the produced denormalized values have a special characteristic. If you don't want that, multiply by some reasonably small integer value.

Steve Hollasch · Answer

For full flexibility and coverage, create a union with an unsigned integer value and a floating-point value of the same size. Probably unsigned int + float for 32-bit values, and unsigned long + double for 64-bit values. Store your desired denorm value in the integer field, and read out the corresponding floating-point value. If you want to get fancy, specify the integer with bit fields, for the sign, exponent and fraction portions of your floating point number.

32-bit denorms are in the range [0x00000001, 0x007fffff] for positive values, and [0x80000001, 0x807fffff] for negative denorms.

64-bit denorms are in the range [0x0000000000000001, 0x000fffffffffffff] for positive values, and [0x8000000000000001, 0x800fffffffffffff] for negative denorms.

If you use bit fields in your translation union, then the sign bit is arbitrary, exponent is zero, and the fraction can be any value but zero.

Create denormalized (subnormal) floating point values in C++

Tags:

c++

floating-point

Silicomancer

2 Answers

Dietmar Kühl

Steve Hollasch

Recent Activity

Donate For Us

Create denormalized (subnormal) floating point values in C++

Tags:

c++

floating-point

Silicomancer

2 Answers

Dietmar Kühl

Steve Hollasch

Related questions

Recent Activity

Donate For Us