Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are floating point numbers stored in memory?

I've read that they're stored in the form of mantissa and exponent

I've read this document but I could not understand anything.

like image 640
Jitendra Avatar asked Oct 04 '11 07:10

Jitendra


People also ask

How is a floating point number stored in a register?

The floating point number will be stored as a normalized binary floating point encoding with 1 sign bit, 4 bits of fraction and a 3-bit exponent (see Chapter 8, Section 8.7).

How are numbers stored in memory?

Numbers are stored on the computer in binary form. In other words, information is encoded as a sequence of 1's and 0's. On most computers, the memory is organized into 8-bit bytes. This means each 8-bit byte stored in memory will have a separate address.

How a decimal point is stored in the memory?

Decimals are stored as floating points . A 32 bit floating point consists of 23 precision bits, one sign bit, and 8 magnitude bits. The number is written the same as scientific notation but instead of a*10^b, it is a*2^b. This is why we say floats have a limited precision but can be super big.

How are floating point numbers stored in Python?

The float type in Python represents the floating point number. Float is used to represent real numbers and is written with a decimal point dividing the integer and fractional parts. For example, 97.98, 32.3+e18, -32.54e100 all are floating point numbers.


2 Answers

To understand how they are stored, you must first understand what they are and what kind of values they are intended to handle.

Unlike integers, a floating-point value is intended to represent extremely small values as well as extremely large. For normal 32-bit floating-point values, this corresponds to values in the range from 1.175494351 * 10^-38 to 3.40282347 * 10^+38.

Clearly, using only 32 bits, it's not possible to store every digit in such numbers.

When it comes to the representation, you can see all normal floating-point numbers as a value in the range 1.0 to (almost) 2.0, scaled with a power of two. So:

  • 1.0 is simply 1.0 * 2^0,
  • 2.0 is 1.0 * 2^1, and
  • -5.0 is -1.25 * 2^2.

So, what is needed to encode this, as efficiently as possible? What do we really need?

  • The sign of the expression.
  • The exponent
  • The value in the range 1.0 to (almost) 2.0. This is known as the "mantissa" or the significand.

This is encoded as follows, according to the IEEE-754 floating-point standard.

  • The sign is a single bit.
  • The exponent is stored as an unsigned integer, for 32-bits floating-point values, this field is 8 bits. 1 represents the smallest exponent and "all ones - 1" the largest. (0 and "all ones" are used to encode special values, see below.) A value in the middle (127, in the 32-bit case) represents zero, this is also known as the bias.
  • When looking at the mantissa (the value between 1.0 and (almost) 2.0), one sees that all possible values start with a "1" (both in the decimal and binary representation). This means that it's no point in storing it. The rest of the binary digits are stored in an integer field, in the 32-bit case this field is 23 bits.

In addition to the normal floating-point values, there are a number of special values:

  • Zero is encoded with both exponent and mantissa as zero. The sign bit is used to represent "plus zero" and "minus zero". A minus zero is useful when the result of an operation is extremely small, but it's still important to know from which direction the operation came from.
  • plus and minus infinity -- represented using an "all ones" exponent and a zero mantissa field.
  • Not a Number (NaN) -- represented using an "all ones" exponent and a non-zero mantissa.
  • Denormalized numbers -- numbers smaller than the smallest normal number. Represented using a zero exponent field and a non-zero mantissa. The special thing with these numbers is that the precision (i.e. the number of digits a value can contain) will drop the smaller the value becomes, simply because there is not room for them in the mantissa.

Finally, the following is a handful of concrete examples (all values are in hex):

  • 1.0 : 3f800000
  • -1234.0 : c49a4000
  • 100000000000000000000000.0: 65a96816
like image 153
Lindydancer Avatar answered Oct 12 '22 05:10

Lindydancer


In layman's terms, it's essentially scientific notation in binary. The formal standard (with details) is IEEE 754.

like image 39
Wyzard Avatar answered Oct 12 '22 07:10

Wyzard