I wrote a program to demonstrate floating point error in Go: <pre class="prettyprint"><code>func main() { a := float64(0.2) a += 0.1 a -= 0.3 var i int for i = 0; a < 1.0; i++ { a += a } fmt.Printf("After %d iterations, a = %e\n", i, a) } </code></pre> It prints: <pre class="prettyprint"><code>After 54 iterations, a = 1.000000e+00 </code></pre> This matches the behaviour of the same program written in C (using the <code>double</code> type) However, if <code>float32</code> is used instead, the program gets stuck in an infinite loop! If you modify the C program to use a <code>float</code> instead of a <code>double</code>, it prints <pre class="prettyprint"><code>After 27 iterations, a = 1.600000e+00 </code></pre> Why doesn't the Go program have the same output as the C program when using <code>float32</code>?

Using <code>math.Float32bits</code> and <code>math.Float64bits</code>, you can see how Go represents the different decimal values as a IEEE 754 binary value: Playground: https://play.golang.org/p/ZqzdCZLfvC Result: <pre class="prettyprint"><code>float32(0.1): 00111101110011001100110011001101 float32(0.2): 00111110010011001100110011001101 float32(0.3): 00111110100110011001100110011010 float64(0.1): 0011111110111001100110011001100110011001100110011001100110011010 float64(0.2): 0011111111001001100110011001100110011001100110011001100110011010 float64(0.3): 0011111111010011001100110011001100110011001100110011001100110011 </code></pre> If you convert these binary representation to decimal values and do your loop, you can see that for float32, the initial value of <code>a</code> will be: <pre class="prettyprint"><code>0.20000000298023224 + 0.10000000149011612 - 0.30000001192092896 = -7.4505806e-9 </code></pre> a negative value that can never never sum up to 1. So, why does C behave different? If you look at the binary pattern (and know slightly about how to represent binary values), you can see that Go rounds the last bit while I assume C just crops it instead. So, in a sense, while neither Go nor C can represent 0.1 exactly in a float, Go uses the value closest to 0.1: <pre class="prettyprint"><code>Go: 00111101110011001100110011001101 => 0.10000000149011612 C(?): 00111101110011001100110011001100 => 0.09999999403953552 </code></pre> Edit: I posted a question about how C handles float constants, and from the answer it seems that any implementation of the C standard is allowed to do either. The implementation you tried it with just did it differently than Go.

Agree with ANisus, go is doing the right thing. Concerning C, I'm not convinced by his guess. The C standard does not dictate, but most implementations of libc will convert the decimal representation to nearest float (at least to comply with IEEE-754 2008 or ISO 10967), so I don't think this is the most probable explanation. There are several reasons why the C program behavior might differ... Especially, some intermediate computations might be performed with excess precision (double or long double). The most probable thing I can think of, is if ever you wrote 0.1 instead of 0.1f in C. In which case, you might have cause excess precision in initialization (you sum float a+double 0.1 => the float is converted to double, then result is converted back to float) If I emulate these operations <pre class="prettyprint"><code>float32(float32(float32(0.2) + float64(0.1)) - float64(0.3)) </code></pre> Then I find something near 1.1920929e-8f After 27 iterations, this sums to 1.6f

Golang floating point precision float32 vs float64

Tags:

floating-point

precision

go

I wrote a program to demonstrate floating point error in Go:

func main() {     a := float64(0.2)      a += 0.1     a -= 0.3     var i int     for i = 0; a < 1.0; i++ {         a += a     }     fmt.Printf("After %d iterations, a = %e\n", i, a) }

It prints:

After 54 iterations, a = 1.000000e+00

This matches the behaviour of the same program written in C (using the double type)

However, if float32 is used instead, the program gets stuck in an infinite loop! If you modify the C program to use a float instead of a double, it prints

After 27 iterations, a = 1.600000e+00

Why doesn't the Go program have the same output as the C program when using float32?

333

asked Mar 11 '14 21:03

charliehorse55

2 Answers

Using math.Float32bits and math.Float64bits, you can see how Go represents the different decimal values as a IEEE 754 binary value:

Playground: https://play.golang.org/p/ZqzdCZLfvC

Result:

float32(0.1): 00111101110011001100110011001101 float32(0.2): 00111110010011001100110011001101 float32(0.3): 00111110100110011001100110011010 float64(0.1): 0011111110111001100110011001100110011001100110011001100110011010 float64(0.2): 0011111111001001100110011001100110011001100110011001100110011010 float64(0.3): 0011111111010011001100110011001100110011001100110011001100110011

If you convert these binary representation to decimal values and do your loop, you can see that for float32, the initial value of a will be:

0.20000000298023224 + 0.10000000149011612 - 0.30000001192092896 = -7.4505806e-9

a negative value that can never never sum up to 1.

So, why does C behave different?

If you look at the binary pattern (and know slightly about how to represent binary values), you can see that Go rounds the last bit while I assume C just crops it instead.

So, in a sense, while neither Go nor C can represent 0.1 exactly in a float, Go uses the value closest to 0.1:

Go:   00111101110011001100110011001101 => 0.10000000149011612 C(?): 00111101110011001100110011001100 => 0.09999999403953552

Edit:

I posted a question about how C handles float constants, and from the answer it seems that any implementation of the C standard is allowed to do either. The implementation you tried it with just did it differently than Go.

148

answered Oct 01 '22 05:10

ANisus

Agree with ANisus, go is doing the right thing. Concerning C, I'm not convinced by his guess.

The C standard does not dictate, but most implementations of libc will convert the decimal representation to nearest float (at least to comply with IEEE-754 2008 or ISO 10967), so I don't think this is the most probable explanation.

There are several reasons why the C program behavior might differ... Especially, some intermediate computations might be performed with excess precision (double or long double).

The most probable thing I can think of, is if ever you wrote 0.1 instead of 0.1f in C.
In which case, you might have cause excess precision in initialization
(you sum float a+double 0.1 => the float is converted to double, then result is converted back to float)

If I emulate these operations

float32(float32(float32(0.2) + float64(0.1)) - float64(0.3))

Then I find something near 1.1920929e-8f

After 27 iterations, this sums to 1.6f

answered Oct 01 '22 05:10

aka.nice

Related questions
                            
                                Building a dll with Go 1.7
                            
                                What is the maximum time.Time in Go?
                            
                                Difference between []uint8 && []byte (Golang Slices)
                            
                                How to format a duration
                            
                                Go- Copy all common fields between structs
                            
                                How to fix this error "runtime.main: undefined main.init"
                            
                                Break up go project into subfolders
                            
                                OK to exit program with active goroutine?
                            
                                How to write mock for structs in Go
                            
                                How to compare [32]byte with []byte in golang?
                            
                                Rendering CSS in a Go Web Application
                            
                                time.Millisecond * int confusion
                            
                                How to get a value from an XML using XPath in Go
                            
                                Cannot connect to Go GRPC server running in local Docker container
                            
                                How to set vscode format golang code on save?
                            
                                Golang: convert slices into map
                            
                                How to run `go fmt` on save, in Visual Studio Code?
                            
                                Compare only date part ot time.Time in Golang
                            
                                Implications of defining a struct inside a function vs outside?
                            
                                When to use a buffered channel?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With