Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy computes different standard deviation when axis is specified

In the course of tracking down a related problem I stumbled upon the fact that np.std seems to be returning different values depending on whether the axis keyword argument was specified or the corresponding masking was done manually. Consider the following snippet:

import numpy as np

np.random.seed(123)

a = np.empty(shape=(100, 2), dtype=float)
a[:, 0] = np.random.uniform()
a[:, 1] = np.random.uniform()

print(np.std(a, axis=0)[0] == np.std(a[:, 0]))  # Should be the same.
print(np.std(a, axis=0)[1] == np.std(a[:, 1]))  # Should be the same.

However the two computations don't return the same result. Further inspection reveals:

>>> print('axis=0: {:e} vs {:e}'.format(np.std(a, axis=0)[0], np.std(a[:, 0])))
axis=0: 7.771561e-16 vs 2.220446e-16
>>> print('axis=1: {:e} vs {:e}'.format(np.std(a, axis=0)[1], np.std(a[:, 1])))
axis=1: 4.440892e-16 vs 0.000000e+00

I don't see why the two ways of computation would return different results since formally they describe the same procedure (masking the axis manually or letting numpy do the job by specifying axis shouldn't make a difference).


I am using Python 3.5.2 and numpy 1.15.0.

like image 288
a_guest Avatar asked May 13 '26 18:05

a_guest


1 Answers

These numbers, as you may have noticed, are quite small. So small, in fact, that neither is particularly accurate. Notably, minor differences in implementation will in fact result in different answers do to the inaccuracy of floating point numbers. In numpy's implementation of std, which is in C, performs the axis computation differently than done explicitly here.

Of course, the 'real' standard deviation of this data along the column is of course 0.

like image 76
modesitt Avatar answered May 16 '26 06:05

modesitt