In the course of tracking down a related problem I stumbled upon the fact that np.std seems to be returning different values depending on whether the axis keyword argument was specified or the corresponding masking was done manually. Consider the following snippet:
import numpy as np
np.random.seed(123)
a = np.empty(shape=(100, 2), dtype=float)
a[:, 0] = np.random.uniform()
a[:, 1] = np.random.uniform()
print(np.std(a, axis=0)[0] == np.std(a[:, 0])) # Should be the same.
print(np.std(a, axis=0)[1] == np.std(a[:, 1])) # Should be the same.
However the two computations don't return the same result. Further inspection reveals:
>>> print('axis=0: {:e} vs {:e}'.format(np.std(a, axis=0)[0], np.std(a[:, 0])))
axis=0: 7.771561e-16 vs 2.220446e-16
>>> print('axis=1: {:e} vs {:e}'.format(np.std(a, axis=0)[1], np.std(a[:, 1])))
axis=1: 4.440892e-16 vs 0.000000e+00
I don't see why the two ways of computation would return different results since formally they describe the same procedure (masking the axis manually or letting numpy do the job by specifying axis shouldn't make a difference).
I am using Python 3.5.2 and numpy 1.15.0.
These numbers, as you may have noticed, are quite small. So small, in fact, that neither is particularly accurate. Notably, minor differences in implementation will in fact result in different answers do to the inaccuracy of floating point numbers. In numpy's implementation of std, which is in C, performs the axis computation differently than done explicitly here.
Of course, the 'real' standard deviation of this data along the column is of course 0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With