Error in astype float32 vs float64 for integer

Question

I'm sure this is due to a lapse in my understanding in how casting between different precision of float works, but can someone explain why the value is getting cast as 3 less than its true value in 32 vs 64 bit representation?

>>> a = np.array([83734315])
>>> a.astype('f')
array([ 83734312.], dtype=float32)
>>> a.astype('float64')
array([ 83734315.])

NPE · Accepted Answer

A 32-bit float can exactly represent about 7 decimal digits of mantissa. Your number requires more, and therefore cannot be represented exactly.

The mechanics of what happens are as follows:

A 32-bit float has a 24-bit mantissa. Your number requires 27 bits to be represented exactly, so the last three bits are getting truncated (set to zero). The three lowest bits of your number are 011₂; these are getting set to 000₂. Observe that 011₂ is 3₁₀.

Fred Foo · Answer

A float32 only has 24 bits of significand precision, which is roughly seven digits (log10(2**24) = 7.22). You're expecting it to store an 8-digit number exactly, which in general is impossible.

Error in astype float32 vs float64 for integer

Tags:

python

floating-point

numpy

python-2.7

jsexauer

2 Answers

NPE

Fred Foo

Recent Activity

Donate For Us

Error in astype float32 vs float64 for integer

Tags:

python

floating-point

numpy

python-2.7

jsexauer

2 Answers

NPE

Fred Foo

Related questions

Recent Activity

Donate For Us