Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy.mean precision for large arrays

I do not understand why casting a float32-Array to a float64-Array changes the mean of the array significantly.

import numpy as n  

a = n.float32(100. * n.random.random_sample((10000000))+1000.)
b = a.astype(n.float64)        
print n.mean(a), a.dtype, a.shape
print n.mean(b), b.dtype, b.shape

result (should be approx. 1050, so float64 is correct):

1028.346368   float32 (10000000,)                                                          
1049.98284473 float64 (10000000,)
like image 774
user1514974 Avatar asked Jun 13 '26 01:06

user1514974


1 Answers

@bogatron has explained what causes the loss in precision. To get around this kind of problem, np.mean has an optional dtype argument, that lets you specify what type to use for the internal operations. So you can do:

>>> np.mean(a)
1028.3446272000001
>>> np.mean(a.astype(np.float64))
1049.9776601123901
>>> np.mean(a, dtype=np.float64)
1049.9776601123901

The third case is significantly faster than the second, although slower than the first:

In [3]: %timeit np.mean(a)
100 loops, best of 3: 10.9 ms per loop

In [4]: %timeit np.mean(a.astype(np.float64))
10 loops, best of 3: 51 ms per loop

In [5]: %timeit np.mean(a, dtype=np.float64)
100 loops, best of 3: 19.2 ms per loop
like image 156
Jaime Avatar answered Jun 17 '26 08:06

Jaime



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!