I am trying to understand floating point arithmetic better and have seen a few links to 'What Every Computer Scientist Should Know About Floating Point Arithmetic'. I still don't understand how a number like <code>0.1</code> or <code>0.5</code> is stored in floats and as decimals. Can someone please explain how it is laid out is memory? I know about the float being two parts (i.e., a number to the power of something).

I've always pointed people towards Harald Schmidt's online converter, along with the Wikipedia IEEE754-1985 article with its nice pictures. For those two specific values, you get (for 0.1): <pre class="prettyprint"><code>s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm 1/n 0 01111011 10011001100110011001101 | || || || || || +- 8388608 | || || || || |+--- 2097152 | || || || || +---- 1048576 | || || || |+------- 131072 | || || || +-------- 65536 | || || |+----------- 8192 | || || +------------ 4096 | || |+--------------- 512 | || +---------------- 256 | |+------------------- 32 | +-------------------- 16 +----------------------- 2 </code></pre> The sign is positive, that's pretty easy. The exponent is <code>64+32+16+8+2+1 = 123 - 127 bias = -4</code>, so the multiplier is <code>2-4</code> or <code>1/16</code>. The mantissa is chunky. It consists of <code>1</code> (the implicit base) plus (for all those bits with each being worth <code>1/(2n)</code> as <code>n</code> starts at <code>1</code> and increases to the right), <code>{1/2, 1/16, 1/32, 1/256, 1/512, 1/4096, 1/8192, 1/65536, 1/131072, 1/1048576, 1/2097152, 1/8388608}</code>. When you add all these up, you get <code>1.60000002384185791015625</code>. When you multiply that by the multiplier, you get <code>0.100000001490116119384765625</code>, which is why they say you cannot represent <code>0.1</code> exactly as an IEEE754 float, and provides so much opportunity on SO for people answering <code>"why doesn't 0.1 + 0.1 + 0.1 == 0.3?"</code>-type questions :-) <hr> The 0.5 example is substantially easier. It's represented as: <pre class="prettyprint"><code>s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm 0 01111110 00000000000000000000000 </code></pre> which means it's the implicit base, <code>1</code>, plus no other additives (all the mantissa bits are zero). The sign is again positive. The exponent is <code>64+32+16+8+4+2 = 126 - 127 bias = -1</code>. Hence the multiplier is <code>2-1</code> which is <code>1/2</code> or <code>0.5</code>. So the final value is <code>1</code> multiplied by <code>0.5</code>, or <code>0.5</code>. Voila! <hr> I've sometimes found it easier to think of it in terms of decimal. The number 1.345 is equivalent to <pre class="prettyprint"><code>1 + 3/10 + 4/100 + 5/1000 </code></pre> or: <pre class="prettyprint"><code> -1 -2 -3 1 + 3*10 + 4*10 + 5*10 </code></pre> <hr> Similarly, the IEEE754 representation for decimal <code>0.8125</code> is: <pre class="prettyprint"><code>s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm 0 01111110 10100000000000000000000 </code></pre> With the implicit base of 1, that's equivalent to the binary: <pre class="prettyprint"><code> 01111110-01111111 1.101 * 2 </code></pre> or: <pre class="prettyprint"><code> -1 (1 + 1/2 + 1/8) * 2 (no 1/4 since that bit is 0) </code></pre> which becomes: <pre class="prettyprint"><code>(8/8 + 4/8 + 1/8) * 1/2 </code></pre> and then becomes: <pre class="prettyprint"><code>13/8 * 1/2 = 0.8125 </code></pre>

How To Represent 0.1 In Floating Point Arithmetic And Decimal

1 Answers

I've always pointed people towards Harald Schmidt's online converter, along with the Wikipedia IEEE754-1985 article with its nice pictures.

For those two specific values, you get (for 0.1):

s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm    1/n
0 01111011 10011001100110011001101
           |  ||  ||  ||  ||  || +- 8388608
           |  ||  ||  ||  ||  |+--- 2097152
           |  ||  ||  ||  ||  +---- 1048576
           |  ||  ||  ||  |+-------  131072
           |  ||  ||  ||  +--------   65536
           |  ||  ||  |+-----------    8192
           |  ||  ||  +------------    4096
           |  ||  |+---------------     512
           |  ||  +----------------     256
           |  |+-------------------      32
           |  +--------------------      16
           +-----------------------       2

The sign is positive, that's pretty easy.

The exponent is 64+32+16+8+2+1 = 123 - 127 bias = -4, so the multiplier is 2^-4 or 1/16.

The mantissa is chunky. It consists of 1 (the implicit base) plus (for all those bits with each being worth 1/(2ⁿ) as n starts at 1 and increases to the right), {1/2, 1/16, 1/32, 1/256, 1/512, 1/4096, 1/8192, 1/65536, 1/131072, 1/1048576, 1/2097152, 1/8388608}.

When you add all these up, you get 1.60000002384185791015625.

When you multiply that by the multiplier, you get 0.100000001490116119384765625, which is why they say you cannot represent 0.1 exactly as an IEEE754 float, and provides so much opportunity on SO for people answering "why doesn't 0.1 + 0.1 + 0.1 == 0.3?"-type questions :-)

The 0.5 example is substantially easier. It's represented as:

s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm
0 01111110 00000000000000000000000

which means it's the implicit base, 1, plus no other additives (all the mantissa bits are zero).

The sign is again positive. The exponent is 64+32+16+8+4+2 = 126 - 127 bias = -1. Hence the multiplier is 2^-1 which is 1/2 or 0.5.

So the final value is 1 multiplied by 0.5, or 0.5. Voila!

I've sometimes found it easier to think of it in terms of decimal.

The number 1.345 is equivalent to

1 + 3/10   + 4/100 + 5/1000

or:

        -1       -2      -3
1 + 3*10   + 4*10  + 5*10

Similarly, the IEEE754 representation for decimal 0.8125 is:

s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm
0 01111110 10100000000000000000000

With the implicit base of 1, that's equivalent to the binary:

         01111110-01111111
1.101 * 2

or:

                     -1
(1   + 1/2 + 1/8) * 2     (no 1/4 since that bit is 0)

which becomes:

(8/8 + 4/8 + 1/8) * 1/2

and then becomes:

13/8 * 1/2 = 0.8125

195

answered Sep 30 '22 07:09

paxdiablo

Related questions
                            
                                IFormFile always return null in asp.net core 2.1
                            
                                .NET: How to check the type within a generic typed class?
                            
                                Lance Hunt's C# Coding Standards - enum confusion
                            
                                How to dynamically access element names in XAML?
                            
                                DataGridView Selected Row Move UP and DOWN
                            
                                C# get digits from float variable
                            
                                How to "import" a static class in C#?
                            
                                How to Mock a Static Singleton?
                            
                                Parse a number from a string with non-digits in between
                            
                                Coalesce operator in C#?
                            
                                Retrieving creation date of file (FTP)
                            
                                When should I use XML Serialization vs. Binary Serialization in the .NET framework?
                            
                                MVC3 - Model empty on post
                            
                                Moq Verify events triggered
                            
                                Why won't this Path.Combine work? [duplicate]
                            
                                Is it possible to prevent EntityFramework 4 from overwriting customized properties?
                            
                                How to delete a chosen element in array?
                            
                                How to disable all controls on the form except for a button?
                            
                                How can I disable a dropdownlist in ASP.NET?
                            
                                Why are there no concurrent collections in C#?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How To Represent 0.1 In Floating Point Arithmetic And Decimal

Tags:

c#

floating-point

double

decimal

bit-representation

Jack Kada

People also ask

1 Answers

paxdiablo

Recent Activity

Donate For Us