Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to correctly normalize a floating point value in C++?

Maybe I don't understand the IEEE754 standard that much, but given a set of floating point values that are float or double, for example :

56.543f 3238.124124f 121.3f ...

you are able to convert them in values ranging from 0 to 1, so you normalize them, by taking an appropriate common factor while considering what is the maximum value and the minimum value in the set.

Now my point is that in this transformation I need a much higher precision for the set of destination that ranges from 0 to 1 if compared to the level of precision that I need in the first one, especially if the values in the first set are covering a wide range of numerical values ( really big and really small values ).

How the float or the double ( or the IEEE 754 standard if you want ) type can handle this situation while providing more precision for the second set of values knowing that I will basically not need an integer part ?

Or it doesn't handle this at all and I need fixed point math with a totally different type ?

like image 424
user2485710 Avatar asked Dec 09 '13 15:12

user2485710


People also ask

Why do we normalize floating-point numbers?

A normalized number provides more accuracy than corresponding de-normalized number. The implied most significant bit can be used to represent even more accurate significant (23 + 1 = 24 bits) which is called subnormal representation. The floating point numbers are to be represented in normalized form.


1 Answers

Floating point numbers are stored in a format similar to scientific notation. Internally, they align the leading 1 of the binary representation to the top of the significand. Each value is carried with the same number of binary digits of precision relative to its own magnitude.

When you compress your set of floating point values to the range 0..1, the only precision loss you will get will be due to the rounding that occurs in the various steps of the process.

If you're merely compressing by scaling, you will lose only a small amount of precision near the LSBs of the mantissa (around 1 or 2 ulp, where ulp means "units of the last place).

If you also need to shift your data, then things get trickier. If your data is all positive, then subtracting off the smallest number will not damage anything. But, if your data is a mixture of positive and negative data, then some of your values near zero may suffer a loss in precision.

If you do all the arithmetic at double precision, you'll carry 53 bits of precision through the calculation. If your precision needs fit within that (which likely they do), then you'll be fine. Otherwise, the exact numerical performance will depend on the distribution of your data.

like image 76
Joe Z Avatar answered Nov 14 '22 23:11

Joe Z