I do not understand why casting a float32-Array to a float64-Array changes the mean of the array significantly.
import numpy as n
a = n.float32(100. * n.random.random_sample((10000000))+1000.)
b = a.astype(n.float64)
print n.mean(a), a.dtype, a.shape
print n.mean(b), b.dtype, b.shape
result (should be approx. 1050, so float64 is correct):
1028.346368 float32 (10000000,)
1049.98284473 float64 (10000000,)
@bogatron has explained what causes the loss in precision. To get around this kind of problem, np.mean has an optional dtype argument, that lets you specify what type to use for the internal operations. So you can do:
>>> np.mean(a)
1028.3446272000001
>>> np.mean(a.astype(np.float64))
1049.9776601123901
>>> np.mean(a, dtype=np.float64)
1049.9776601123901
The third case is significantly faster than the second, although slower than the first:
In [3]: %timeit np.mean(a)
100 loops, best of 3: 10.9 ms per loop
In [4]: %timeit np.mean(a.astype(np.float64))
10 loops, best of 3: 51 ms per loop
In [5]: %timeit np.mean(a, dtype=np.float64)
100 loops, best of 3: 19.2 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With