Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create denormalized (subnormal) floating point values in C++

For testing purposes I need to create one positive and one negative denormalized (aka subnormal) float and double number in C++.

How can I write/produce such a number?

like image 713
Silicomancer Avatar asked Nov 02 '25 06:11

Silicomancer


2 Answers

You could use -std::numeric_limits<T>::denorm_min() and std::numeric_limits<T>::denorm_min(). It is just incidental that the produced denormalized values have a special characteristic. If you don't want that, multiply by some reasonably small integer value.

like image 176
Dietmar Kühl Avatar answered Nov 04 '25 23:11

Dietmar Kühl


For full flexibility and coverage, create a union with an unsigned integer value and a floating-point value of the same size. Probably unsigned int + float for 32-bit values, and unsigned long + double for 64-bit values. Store your desired denorm value in the integer field, and read out the corresponding floating-point value. If you want to get fancy, specify the integer with bit fields, for the sign, exponent and fraction portions of your floating point number.

32-bit denorms are in the range [0x00000001, 0x007fffff] for positive values, and [0x80000001, 0x807fffff] for negative denorms.

64-bit denorms are in the range [0x0000000000000001, 0x000fffffffffffff] for positive values, and [0x8000000000000001, 0x800fffffffffffff] for negative denorms.

If you use bit fields in your translation union, then the sign bit is arbitrary, exponent is zero, and the fraction can be any value but zero.

like image 22
Steve Hollasch Avatar answered Nov 04 '25 22:11

Steve Hollasch



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!